13 Commits

Author SHA1 Message Date
Sergey Penkovsky
21cfd79c19 refactor(assets): update and reorganize GPT-1 architecture diagrams
- Renamed GPT-1 main scheme files for clarity
- Added new diagram files for attention, decoder, embeddings, and forward blocks (both .drawio and .png)
- Removed deprecated files (gpt11.drawio, gpt1.svg)
- Updated notebooks/gpt.ipynb with relevant changes
2025-10-30 14:40:31 +03:00
Sergey Penkovsky
9e2796e6be docs(gpt1): add architecture diagrams and notebook updates
- Added architecture diagrams for GPT-1: gpt1.drawio, gpt11.drawio (drawio format)
- Exported visualization images: gpt1.png, gpt1.svg for documentation and presentations
- Updated gpt.ipynb notebook to reference new materials and possibly add explanations of layers/logic
- New assets help to clarify model structure and training flow for both contributors and external users
2025-10-24 17:42:11 +03:00
Sergey Penkovsky
cfb4b6dfb1 feat(gemma): initial implementation of Gemma model and configs
- Add core Gemma model (architecture, attention, GeGLU, RoPE, RMSNorm, etc)
- Add configs for training and generation: gemma_train.json, gemma_generate.json
- Add Gemma notebook for exploratory analysis and demonstration
- Add __init__.py for Gemma submodule
- Update run_llm_experiment.py to support Gemma experiment configs

test(gemma): add comprehensive unit tests for Gemma

- Test forward pass (with/without cache)
- Test autoregressive generation (greedy, top-k, top-p)
- Test shape correctness and max sequence length errors
- Test multi-layer stack and token embeddings

docs: add documentation notebook for Gemma usage and analysis

Closes: #issue (if applicable)
2025-10-21 01:02:15 +03:00
Sergey Penkovsky
b1737bbce2 feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests
- Add Mixtral architecture implementation with MoE support (llm/src/llm/models/mixtral/mixtral.py)
- Introduce generic Mixture-of-Experts (MoE) block (llm/src/llm/core/moe.py)
- Create dedicated configuration files for Mixtral training and generation experiments
- Register and test Mixtral support in experiment runner (run_llm_experiment.py)
- Add unit tests for Mixtral API including forward, caching, and generation modes
- Include Jupyter notebook mixstral.ipynb for architectural exploration and research
- Ensure correct handling of torch bool masks in sampling (top-k, top-p) during generation

BREAKING CHANGE: Adds new model code and test coverage, modifying experiment runner logic to register Mixtral.
2025-10-20 08:12:11 +03:00
Sergey Penkovsky
ec0d2bd8d0 feat(mistral): add Mistral model implementation and configs
- implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention
- add __init__.py for module export
- add config files for mistral training and generation
- update universal experiment runner to support Mistral model
- add notebook for Mistral experiments
2025-10-14 14:53:45 +03:00
Sergey Penkovsky
e5706a690d fix(rope, attention): корректное позиционирование RoPE при генерации с кэшем
- Исправлена ошибка расчёта позиции для RoPE (Rotary Positional Embeddings) при автодополнении с использованием кэша.
- В HeadAttention теперь передаётся start_pos в RoPE, вычисляемый из длины кэша.
- Обновлена сигнатура и логика метода RoPE.forward.
- Обновлен ноутбук llama.ipynb под новые интерфейсы и выводы.

BREAKING CHANGE: переопределён метод forward у RoPE, требуется обновить код, если RoPE использовался вручную.
2025-10-14 12:03:20 +03:00
Sergey Penkovsky
9898e8ee83 feat: add RoPE positional embeddings implementation in llama.ipynb
- Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components
- Add vectorized computation of inverse frequencies for RoPE
- Include tensor slicing utilities for even/odd column separation
- Update dependencies in pyproject.toml and uv.lock
2025-10-06 12:52:59 +03:00
Sergey Penkovsky
b6f56a2640 fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update 2025-10-05 23:01:58 +03:00
Sergey Penkovsky
aa408e941a docs: add GPT-2 analysis notebook
- Add gpt2.ipynb with GPT-2 model experiments and comparisons
2025-10-05 12:48:32 +03:00
Sergey Penkovsky
da1cf3fb55 fix: rename notebook 2025-10-05 12:46:17 +03:00
Sergey Penkovsky
1f9a4d2fa9 chore: add ipykernel dependency and update notebooks
- Add ipykernel to project dependencies for Jupyter notebook support
- Update BPE and GPT analysis notebooks with latest experiments
2025-10-05 11:59:24 +03:00
Sergey Penkovsky
f060497eb1 docs: add analysis notebooks for BPE and GPT
- Add bpe.ipynb with Byte Pair Encoding implementation analysis
- Update gpt_analysis.ipynb with GPT model experiments and visualizations
2025-10-05 08:23:09 +03:00
Sergey Penkovsky
ec07546ea8 feat: initial project setup with LLM architecture and HF integration
- Add LLM library with GPT model implementation
- Add hf-proxy for HuggingFace integration
- Add experiments for training and generation
- Add comprehensive documentation and examples
- Configure uv workspace with proper dependencies
2025-10-04 22:40:21 +03:00