llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	ec0d2bd8d0	feat(mistral): add Mistral model implementation and configs - implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention - add __init__.py for module export - add config files for mistral training and generation - update universal experiment runner to support Mistral model - add notebook for Mistral experiments	2025-10-14 14:53:45 +03:00
Sergey Penkovsky	3e4815fcc6	refactor(experiments): migrate to universal runner + config structure, remove legacy scripts - add universal runner run_llm_experiment.py with JSON-config driven LLM training / generation - add configs for gpt, gpt2, llama (training/generation) - remove individual train/generate scripts for each model - update README with simple how-to for experiments block BREAKING CHANGE: all llm_only experiments now run only through run_llm_experiment.py; legacy scripts removed	2025-10-14 11:57:23 +03:00
Sergey Penkovsky	712278e33c	Рефакторинг: единообразие оформления кода (пробелы, кавычки, пустые строки), без изменения логики по всему проекту.	2025-10-06 22:57:19 +03:00
Sergey Penkovsky	f30cd530a9	feat: add LLaMA model implementation with RoPE positional encoding - Added LLaMA model architecture with RMSNorm and SwiGLU activation - Implemented Rotary Positional Embeddings (RoPE) for better positional encoding - Created training script for LLaMA with BPE tokenizer - Fixed matplotlib dependency version in uv.lock - Added LLaMA module initialization The implementation includes: - TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support - RMSNorm normalization layer - SwiGLU feed-forward activation - Cached decoder implementation for efficient generation	2025-10-06 13:26:20 +03:00
Sergey Penkovsky	c39e68d71a	feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.	2025-10-05 19:11:20 +03:00
Sergey Penkovsky	ec07546ea8	feat: initial project setup with LLM architecture and HF integration - Add LLM library with GPT model implementation - Add hf-proxy for HuggingFace integration - Add experiments for training and generation - Add comprehensive documentation and examples - Configure uv workspace with proper dependencies	2025-10-04 22:40:21 +03:00

6 Commits