Commit Graph

8 Commits

Author SHA1 Message Date
Sergey Penkovsky
211adf574c refactor: extract LLaMA components to separate modules in core directory
- Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/
- Updated feed_forward.py to use new modular components
- Modified llama.py to import components from core modules instead of local definitions
- Improved code organization and reusability of activation functions and normalization layers

This refactor enables better code reuse across different model architectures and follows the single responsibility principle.
2025-10-06 14:09:19 +03:00
Sergey Penkovsky
f30cd530a9 feat: add LLaMA model implementation with RoPE positional encoding
- Added LLaMA model architecture with RMSNorm and SwiGLU activation
- Implemented Rotary Positional Embeddings (RoPE) for better positional encoding
- Created training script for LLaMA with BPE tokenizer
- Fixed matplotlib dependency version in uv.lock
- Added LLaMA module initialization

The implementation includes:
- TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support
- RMSNorm normalization layer
- SwiGLU feed-forward activation
- Cached decoder implementation for efficient generation
2025-10-06 13:26:20 +03:00
Sergey Penkovsky
3843e64098 test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs 2025-10-05 19:26:18 +03:00
Sergey Penkovsky
c39e68d71a feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE. 2025-10-05 19:11:20 +03:00
Sergey Penkovsky
f866ed7ac7 fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility) 2025-10-05 15:52:21 +03:00
Sergey Penkovsky
fb74dc7c17 test: add comprehensive test suite for LLM components
- Add pytest configuration and fixtures
- Add tests for core modules: decoder, feed_forward, multi_head_attention
- Add tests for positional and token embeddings
- Add tests for GPT model
- Add tests for tokenizers (base and BPE)
- Add basic integration tests
2025-10-05 08:11:18 +03:00
Sergey Penkovsky
f4bdc81829 fix: update PyTorch mask types and BPE tokenizer serialization
- Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate
- Add save/load methods to BPETokenizer for proper merges and vocab_list serialization
- Update dependencies in pyproject.toml
2025-10-05 08:09:30 +03:00
Sergey Penkovsky
ec07546ea8 feat: initial project setup with LLM architecture and HF integration
- Add LLM library with GPT model implementation
- Add hf-proxy for HuggingFace integration
- Add experiments for training and generation
- Add comprehensive documentation and examples
- Configure uv workspace with proper dependencies
2025-10-04 22:40:21 +03:00