llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00

Files

Sergey Penkovsky f30cd530a9 feat: add LLaMA model implementation with RoPE positional encoding

- Added LLaMA model architecture with RMSNorm and SwiGLU activation
- Implemented Rotary Positional Embeddings (RoPE) for better positional encoding
- Created training script for LLaMA with BPE tokenizer
- Fixed matplotlib dependency version in uv.lock
- Added LLaMA module initialization

The implementation includes:
- TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support
- RMSNorm normalization layer
- SwiGLU feed-forward activation
- Cached decoder implementation for efficient generation

2025-10-06 13:26:20 +03:00

generate_gpt2_bpe.py

feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.

2025-10-05 19:11:20 +03:00

generate_gpt_bpe.py

feat: initial project setup with LLM architecture and HF integration

2025-10-04 22:40:21 +03:00

train_gpt2_bpe.py

feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.

2025-10-05 19:11:20 +03:00

train_gpt_bpe.py

feat: initial project setup with LLM architecture and HF integration

2025-10-04 22:40:21 +03:00

train_llama_bpe.py

feat: add LLaMA model implementation with RoPE positional encoding

2025-10-06 13:26:20 +03:00