llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-24 13:32:08 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	211adf574c	refactor: extract LLaMA components to separate modules in core directory - Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/ - Updated feed_forward.py to use new modular components - Modified llama.py to import components from core modules instead of local definitions - Improved code organization and reusability of activation functions and normalization layers This refactor enables better code reuse across different model architectures and follows the single responsibility principle.	2025-10-06 14:09:19 +03:00
Sergey Penkovsky	f30cd530a9	feat: add LLaMA model implementation with RoPE positional encoding - Added LLaMA model architecture with RMSNorm and SwiGLU activation - Implemented Rotary Positional Embeddings (RoPE) for better positional encoding - Created training script for LLaMA with BPE tokenizer - Fixed matplotlib dependency version in uv.lock - Added LLaMA module initialization The implementation includes: - TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support - RMSNorm normalization layer - SwiGLU feed-forward activation - Cached decoder implementation for efficient generation	2025-10-06 13:26:20 +03:00
Sergey Penkovsky	3843e64098	test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs	2025-10-05 19:26:18 +03:00
Sergey Penkovsky	c39e68d71a	feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.	2025-10-05 19:11:20 +03:00
Sergey Penkovsky	f866ed7ac7	fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility)	2025-10-05 15:52:21 +03:00
Sergey Penkovsky	fb74dc7c17	test: add comprehensive test suite for LLM components - Add pytest configuration and fixtures - Add tests for core modules: decoder, feed_forward, multi_head_attention - Add tests for positional and token embeddings - Add tests for GPT model - Add tests for tokenizers (base and BPE) - Add basic integration tests	2025-10-05 08:11:18 +03:00
Sergey Penkovsky	f4bdc81829	fix: update PyTorch mask types and BPE tokenizer serialization - Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate - Add save/load methods to BPETokenizer for proper merges and vocab_list serialization - Update dependencies in pyproject.toml	2025-10-05 08:09:30 +03:00
Sergey Penkovsky	ec07546ea8	feat: initial project setup with LLM architecture and HF integration - Add LLM library with GPT model implementation - Add hf-proxy for HuggingFace integration - Add experiments for training and generation - Add comprehensive documentation and examples - Configure uv workspace with proper dependencies	2025-10-04 22:40:21 +03:00

8 Commits