llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	3bc2848cf0	refactor: unify CachedDecoder implementation across models - Completely removed duplicate CachedDecoder from llama.py - Modified core CachedDecoder to support dependency injection: - Added feed_forward_layer parameter (required) - Added norm_layer parameter with LayerNorm default - Added rope parameter for RoPE support - Removed unused activation parameter - Updated GPT2 to use new CachedDecoder with FeedForward - Updated LLaMA to use new CachedDecoder with SwiGLU and RMSNorm - Fixed parameter order in constructor to follow Python syntax rules This eliminates all code duplication while maintaining architectural specificities through dependency injection.	2025-10-06 14:57:29 +03:00
Sergey Penkovsky	d99d605b35	refactor: partial removal of duplicate code by using core modules - Removed duplicate HeadAttention and MultiHeadAttention implementations from llama.py - Now importing MultiHeadAttention from core module - Added RoPE support parameter to core HeadAttention constructor - Kept LLaMA-specific CachedDecoder implementation (uses SwiGLU and RMSNorm) - Core CachedDecoder uses different components (FeedForward and LayerNorm) - Improved code reuse for attention components while maintaining LLaMA-specific decoder This is a partial refactor - attention components are now shared, but decoder remains LLaMA-specific due to different normalization and activation requirements.	2025-10-06 14:26:32 +03:00
Sergey Penkovsky	211adf574c	refactor: extract LLaMA components to separate modules in core directory - Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/ - Updated feed_forward.py to use new modular components - Modified llama.py to import components from core modules instead of local definitions - Improved code organization and reusability of activation functions and normalization layers This refactor enables better code reuse across different model architectures and follows the single responsibility principle.	2025-10-06 14:09:19 +03:00
Sergey Penkovsky	f30cd530a9	feat: add LLaMA model implementation with RoPE positional encoding - Added LLaMA model architecture with RMSNorm and SwiGLU activation - Implemented Rotary Positional Embeddings (RoPE) for better positional encoding - Created training script for LLaMA with BPE tokenizer - Fixed matplotlib dependency version in uv.lock - Added LLaMA module initialization The implementation includes: - TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support - RMSNorm normalization layer - SwiGLU feed-forward activation - Cached decoder implementation for efficient generation	2025-10-06 13:26:20 +03:00
Sergey Penkovsky	9898e8ee83	feat: add RoPE positional embeddings implementation in llama.ipynb - Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components - Add vectorized computation of inverse frequencies for RoPE - Include tensor slicing utilities for even/odd column separation - Update dependencies in pyproject.toml and uv.lock	2025-10-06 12:52:59 +03:00
Sergey Penkovsky	b6f56a2640	fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update	2025-10-05 23:01:58 +03:00
Sergey Penkovsky	e5b5a97811	Merge pull request #1 from pese-git/feature/gpt2 Feature/gpt2	2025-10-05 21:30:33 +03:00
Sergey Penkovsky	b9d9bdcc71	docs(readme): add explicit support notice for GPT-2 architecture and usage examples	2025-10-05 21:29:38 +03:00
Sergey Penkovsky	c31eed8551	fix(hf-integration): handle logits as tuple in hf_adapter, convert torch.Tensor to list in hf_tokenizer.decode for decoding compatibility	2025-10-05 20:47:36 +03:00
Sergey Penkovsky	3843e64098	test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs	2025-10-05 19:26:18 +03:00
Sergey Penkovsky	c39e68d71a	feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.	2025-10-05 19:11:20 +03:00
Sergey Penkovsky	f866ed7ac7	fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility)	2025-10-05 15:52:21 +03:00
Sergey Penkovsky	aa408e941a	docs: add GPT-2 analysis notebook - Add gpt2.ipynb with GPT-2 model experiments and comparisons	2025-10-05 12:48:32 +03:00
Sergey Penkovsky	da1cf3fb55	fix: rename notebook	2025-10-05 12:46:17 +03:00
Sergey Penkovsky	1f9a4d2fa9	chore: add ipykernel dependency and update notebooks - Add ipykernel to project dependencies for Jupyter notebook support - Update BPE and GPT analysis notebooks with latest experiments	2025-10-05 11:59:24 +03:00
Sergey Penkovsky	f060497eb1	docs: add analysis notebooks for BPE and GPT - Add bpe.ipynb with Byte Pair Encoding implementation analysis - Update gpt_analysis.ipynb with GPT model experiments and visualizations	2025-10-05 08:23:09 +03:00
Sergey Penkovsky	fb74dc7c17	test: add comprehensive test suite for LLM components - Add pytest configuration and fixtures - Add tests for core modules: decoder, feed_forward, multi_head_attention - Add tests for positional and token embeddings - Add tests for GPT model - Add tests for tokenizers (base and BPE) - Add basic integration tests	2025-10-05 08:11:18 +03:00
Sergey Penkovsky	f4bdc81829	fix: update PyTorch mask types and BPE tokenizer serialization - Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate - Add save/load methods to BPETokenizer for proper merges and vocab_list serialization - Update dependencies in pyproject.toml	2025-10-05 08:09:30 +03:00
Sergey Penkovsky	ec07546ea8	feat: initial project setup with LLM architecture and HF integration - Add LLM library with GPT model implementation - Add hf-proxy for HuggingFace integration - Add experiments for training and generation - Add comprehensive documentation and examples - Configure uv workspace with proper dependencies	2025-10-04 22:40:21 +03:00

19 Commits