Commit Graph

19 Commits

Author SHA1 Message Date
Sergey Penkovsky
3bc2848cf0 refactor: unify CachedDecoder implementation across models
- Completely removed duplicate CachedDecoder from llama.py
- Modified core CachedDecoder to support dependency injection:
  - Added feed_forward_layer parameter (required)
  - Added norm_layer parameter with LayerNorm default
  - Added rope parameter for RoPE support
  - Removed unused activation parameter
- Updated GPT2 to use new CachedDecoder with FeedForward
- Updated LLaMA to use new CachedDecoder with SwiGLU and RMSNorm
- Fixed parameter order in constructor to follow Python syntax rules

This eliminates all code duplication while maintaining architectural specificities through dependency injection.
2025-10-06 14:57:29 +03:00
Sergey Penkovsky
d99d605b35 refactor: partial removal of duplicate code by using core modules
- Removed duplicate HeadAttention and MultiHeadAttention implementations from llama.py
- Now importing MultiHeadAttention from core module
- Added RoPE support parameter to core HeadAttention constructor
- Kept LLaMA-specific CachedDecoder implementation (uses SwiGLU and RMSNorm)
- Core CachedDecoder uses different components (FeedForward and LayerNorm)
- Improved code reuse for attention components while maintaining LLaMA-specific decoder

This is a partial refactor - attention components are now shared, but decoder remains LLaMA-specific due to different normalization and activation requirements.
2025-10-06 14:26:32 +03:00
Sergey Penkovsky
211adf574c refactor: extract LLaMA components to separate modules in core directory
- Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/
- Updated feed_forward.py to use new modular components
- Modified llama.py to import components from core modules instead of local definitions
- Improved code organization and reusability of activation functions and normalization layers

This refactor enables better code reuse across different model architectures and follows the single responsibility principle.
2025-10-06 14:09:19 +03:00
Sergey Penkovsky
f30cd530a9 feat: add LLaMA model implementation with RoPE positional encoding
- Added LLaMA model architecture with RMSNorm and SwiGLU activation
- Implemented Rotary Positional Embeddings (RoPE) for better positional encoding
- Created training script for LLaMA with BPE tokenizer
- Fixed matplotlib dependency version in uv.lock
- Added LLaMA module initialization

The implementation includes:
- TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support
- RMSNorm normalization layer
- SwiGLU feed-forward activation
- Cached decoder implementation for efficient generation
2025-10-06 13:26:20 +03:00
Sergey Penkovsky
9898e8ee83 feat: add RoPE positional embeddings implementation in llama.ipynb
- Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components
- Add vectorized computation of inverse frequencies for RoPE
- Include tensor slicing utilities for even/odd column separation
- Update dependencies in pyproject.toml and uv.lock
2025-10-06 12:52:59 +03:00
Sergey Penkovsky
b6f56a2640 fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update 2025-10-05 23:01:58 +03:00
Sergey Penkovsky
e5b5a97811 Merge pull request #1 from pese-git/feature/gpt2
Feature/gpt2
2025-10-05 21:30:33 +03:00
Sergey Penkovsky
b9d9bdcc71 docs(readme): add explicit support notice for GPT-2 architecture and usage examples 2025-10-05 21:29:38 +03:00
Sergey Penkovsky
c31eed8551 fix(hf-integration): handle logits as tuple in hf_adapter, convert torch.Tensor to list in hf_tokenizer.decode for decoding compatibility 2025-10-05 20:47:36 +03:00
Sergey Penkovsky
3843e64098 test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs 2025-10-05 19:26:18 +03:00
Sergey Penkovsky
c39e68d71a feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE. 2025-10-05 19:11:20 +03:00
Sergey Penkovsky
f866ed7ac7 fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility) 2025-10-05 15:52:21 +03:00
Sergey Penkovsky
aa408e941a docs: add GPT-2 analysis notebook
- Add gpt2.ipynb with GPT-2 model experiments and comparisons
2025-10-05 12:48:32 +03:00
Sergey Penkovsky
da1cf3fb55 fix: rename notebook 2025-10-05 12:46:17 +03:00
Sergey Penkovsky
1f9a4d2fa9 chore: add ipykernel dependency and update notebooks
- Add ipykernel to project dependencies for Jupyter notebook support
- Update BPE and GPT analysis notebooks with latest experiments
2025-10-05 11:59:24 +03:00
Sergey Penkovsky
f060497eb1 docs: add analysis notebooks for BPE and GPT
- Add bpe.ipynb with Byte Pair Encoding implementation analysis
- Update gpt_analysis.ipynb with GPT model experiments and visualizations
2025-10-05 08:23:09 +03:00
Sergey Penkovsky
fb74dc7c17 test: add comprehensive test suite for LLM components
- Add pytest configuration and fixtures
- Add tests for core modules: decoder, feed_forward, multi_head_attention
- Add tests for positional and token embeddings
- Add tests for GPT model
- Add tests for tokenizers (base and BPE)
- Add basic integration tests
2025-10-05 08:11:18 +03:00
Sergey Penkovsky
f4bdc81829 fix: update PyTorch mask types and BPE tokenizer serialization
- Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate
- Add save/load methods to BPETokenizer for proper merges and vocab_list serialization
- Update dependencies in pyproject.toml
2025-10-05 08:09:30 +03:00
Sergey Penkovsky
ec07546ea8 feat: initial project setup with LLM architecture and HF integration
- Add LLM library with GPT model implementation
- Add hf-proxy for HuggingFace integration
- Add experiments for training and generation
- Add comprehensive documentation and examples
- Configure uv workspace with proper dependencies
2025-10-04 22:40:21 +03:00