llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 13:00:54 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	21cfd79c19	refactor(assets): update and reorganize GPT-1 architecture diagrams - Renamed GPT-1 main scheme files for clarity - Added new diagram files for attention, decoder, embeddings, and forward blocks (both .drawio and .png) - Removed deprecated files (gpt11.drawio, gpt1.svg) - Updated notebooks/gpt.ipynb with relevant changes	2025-10-30 14:40:31 +03:00
Sergey Penkovsky	9e2796e6be	docs(gpt1): add architecture diagrams and notebook updates - Added architecture diagrams for GPT-1: gpt1.drawio, gpt11.drawio (drawio format) - Exported visualization images: gpt1.png, gpt1.svg for documentation and presentations - Updated gpt.ipynb notebook to reference new materials and possibly add explanations of layers/logic - New assets help to clarify model structure and training flow for both contributors and external users	2025-10-24 17:42:11 +03:00
Sergey Penkovsky	cfb4b6dfb1	feat(gemma): initial implementation of Gemma model and configs - Add core Gemma model (architecture, attention, GeGLU, RoPE, RMSNorm, etc) - Add configs for training and generation: gemma_train.json, gemma_generate.json - Add Gemma notebook for exploratory analysis and demonstration - Add __init__.py for Gemma submodule - Update run_llm_experiment.py to support Gemma experiment configs test(gemma): add comprehensive unit tests for Gemma - Test forward pass (with/without cache) - Test autoregressive generation (greedy, top-k, top-p) - Test shape correctness and max sequence length errors - Test multi-layer stack and token embeddings docs: add documentation notebook for Gemma usage and analysis Closes: #issue (if applicable)	2025-10-21 01:02:15 +03:00
Sergey Penkovsky	b1737bbce2	feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests - Add Mixtral architecture implementation with MoE support (llm/src/llm/models/mixtral/mixtral.py) - Introduce generic Mixture-of-Experts (MoE) block (llm/src/llm/core/moe.py) - Create dedicated configuration files for Mixtral training and generation experiments - Register and test Mixtral support in experiment runner (run_llm_experiment.py) - Add unit tests for Mixtral API including forward, caching, and generation modes - Include Jupyter notebook mixstral.ipynb for architectural exploration and research - Ensure correct handling of torch bool masks in sampling (top-k, top-p) during generation BREAKING CHANGE: Adds new model code and test coverage, modifying experiment runner logic to register Mixtral.	2025-10-20 08:12:11 +03:00
Sergey Penkovsky	ec0d2bd8d0	feat(mistral): add Mistral model implementation and configs - implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention - add __init__.py for module export - add config files for mistral training and generation - update universal experiment runner to support Mistral model - add notebook for Mistral experiments	2025-10-14 14:53:45 +03:00
Sergey Penkovsky	e5706a690d	fix(rope, attention): корректное позиционирование RoPE при генерации с кэшем - Исправлена ошибка расчёта позиции для RoPE (Rotary Positional Embeddings) при автодополнении с использованием кэша. - В HeadAttention теперь передаётся start_pos в RoPE, вычисляемый из длины кэша. - Обновлена сигнатура и логика метода RoPE.forward. - Обновлен ноутбук llama.ipynb под новые интерфейсы и выводы. BREAKING CHANGE: переопределён метод forward у RoPE, требуется обновить код, если RoPE использовался вручную.	2025-10-14 12:03:20 +03:00
Sergey Penkovsky	9898e8ee83	feat: add RoPE positional embeddings implementation in llama.ipynb - Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components - Add vectorized computation of inverse frequencies for RoPE - Include tensor slicing utilities for even/odd column separation - Update dependencies in pyproject.toml and uv.lock	2025-10-06 12:52:59 +03:00
Sergey Penkovsky	b6f56a2640	fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update	2025-10-05 23:01:58 +03:00
Sergey Penkovsky	aa408e941a	docs: add GPT-2 analysis notebook - Add gpt2.ipynb with GPT-2 model experiments and comparisons	2025-10-05 12:48:32 +03:00
Sergey Penkovsky	da1cf3fb55	fix: rename notebook	2025-10-05 12:46:17 +03:00
Sergey Penkovsky	1f9a4d2fa9	chore: add ipykernel dependency and update notebooks - Add ipykernel to project dependencies for Jupyter notebook support - Update BPE and GPT analysis notebooks with latest experiments	2025-10-05 11:59:24 +03:00
Sergey Penkovsky	f060497eb1	docs: add analysis notebooks for BPE and GPT - Add bpe.ipynb with Byte Pair Encoding implementation analysis - Update gpt_analysis.ipynb with GPT model experiments and visualizations	2025-10-05 08:23:09 +03:00
Sergey Penkovsky	ec07546ea8	feat: initial project setup with LLM architecture and HF integration - Add LLM library with GPT model implementation - Add hf-proxy for HuggingFace integration - Add experiments for training and generation - Add comprehensive documentation and examples - Configure uv workspace with proper dependencies	2025-10-04 22:40:21 +03:00

13 Commits