Commit Graph

35 Commits

Author SHA1 Message Date
Sergey Penkovsky
e6ca8dee6f docs(core): add comprehensive docstrings and unit tests for GroupedQueryAttention (GQA)
- docs: Rewrite and expand docstrings for the GroupedQueryAttention class and all main methods (__init__, forward, _repeat_kv_heads, _create_sliding_window_mask):
    - explained GQA architecture and motivation
    - included mathematical formulas, step-by-step algorithms, usage examples
    - added references to relevant scientific papers (Mistral, Llama 2, etc.)
- test: Add dedicated unit tests for GQA (output shape correctness, mask/window logic, KV head replication, RoPE processing, error and edge-cases)
- docs/test: Documentation and tests now fully reflect modern GQA usage and best practices for LLM architectures

This commit makes the implementation, usage, and theoretical underpinnings of GQA transparent and reproducible for researchers and engineers.
2025-10-15 17:27:55 +03:00
Sergey Penkovsky
2e72dbaf07 test(llama): add unit tests for generation, cache, and edge cases
- Covers inference with and without cache and with sampling (top-k, top-p)
- Includes test for max sequence length (should raise ValueError)
- Verifies output shape and absence of dtype errors for the mask logic
- Minimal config and random data ensure tests are fast and robust

Motivation: Regression and integration protection for Llama decoding and sampling logic.
2025-10-15 14:37:35 +03:00
Sergey Penkovsky
dc440a3938 test(gpt2): add unit tests for generation, cache behavior, and error conditions
- Covers forward pass with and without KV-cache
- Verifies correct sequence generation for greedy, top-k, and top-p sampling
- Adds ValueError test for exceeding max sequence length
- Uses small random toy config and minimal setup for fast test feedback

Motivation: Prevent regressions in decoding, sampling, and KV-cache logic in GPT2 implementation.
2025-10-15 14:36:32 +03:00
Sergey Penkovsky
50d7593023 fix(gpt2, llama): proper top-k/top-p mask handling in sampling for PyTorch compatibility (bool/uint8)
- Refactored token selection logic in  methods of GPT2 and Llama classes.
- Masks are now created with dtype=torch.bool (or torch.uint8 for legacy PyTorch).
- Used True/False for mask/scatter instead of 1/0, ensuring correctness across PyTorch versions.
- Fixed RuntimeError: masked_fill_ only supports boolean masks, previously raised by uint8-masks in new PyTorch.
- Backward compatibility maintained: code works on PyTorch >=1.2 and for old clusters (via the else branch).

Motivation: Fixes sampling errors for all modern PyTorch users while keeping research code usable on old infra.
2025-10-15 14:35:10 +03:00
Sergey Penkovsky
38682e8c9d test(mistral): add unit tests for model generation and cache 2025-10-15 13:20:50 +03:00
Sergey Penkovsky
e791f7cd93 fix(mistral): fix top-k/top-p mask handling for PyTorch >=1.2 2025-10-15 13:20:30 +03:00
Sergey Penkovsky
d10044e4a7 refactor(core): refactor RoPE and MultiHeadAttention, add math-rich docs, expand tests, remove unused head_attention
- refactor: улучшена и унифицирована реализация RoPE, теперь поддерживаются строгие проверки размерности входа; внесены улучшения и структурные изменения в MultiHeadAttention (более понятная логика, строгая спецификация входов/выходов)
- docs: полностью переписаны docstrings для RoPE и MultiHeadAttention — включены математические формулы, ссылки на научные статьи, подробные пояснения по алгоритму, формату входных данных, ограничениям, примеры использования
- test: добавлены отдельные unit-тесты для RoPE (корректность формы, ошибки на неверную размерность, сохранение нормы, backward/градиенты, работу с параметрами start_pos и батчами)
- chore: удалён неиспользуемый модуль core/head_attention.py
- fix: теперь выбрасывается AssertionError при неправильной размерности входа RoPE; это позволило полностью покрыть тест-кейсы на ошибки

Этот коммит синхронизирует логику реализации базового внимания с современной практикой LLM, укрепляет документацию для инженеров и исследователей, а также расширяет надежность автотестирования библиотеки.
2025-10-15 11:04:07 +03:00
Sergey Penkovsky
ec0d2bd8d0 feat(mistral): add Mistral model implementation and configs
- implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention
- add __init__.py for module export
- add config files for mistral training and generation
- update universal experiment runner to support Mistral model
- add notebook for Mistral experiments
2025-10-14 14:53:45 +03:00
Sergey Penkovsky
e5706a690d fix(rope, attention): корректное позиционирование RoPE при генерации с кэшем
- Исправлена ошибка расчёта позиции для RoPE (Rotary Positional Embeddings) при автодополнении с использованием кэша.
- В HeadAttention теперь передаётся start_pos в RoPE, вычисляемый из длины кэша.
- Обновлена сигнатура и логика метода RoPE.forward.
- Обновлен ноутбук llama.ipynb под новые интерфейсы и выводы.

BREAKING CHANGE: переопределён метод forward у RoPE, требуется обновить код, если RoPE использовался вручную.
2025-10-14 12:03:20 +03:00
Sergey Penkovsky
3e4815fcc6 refactor(experiments): migrate to universal runner + config structure, remove legacy scripts
- add universal runner run_llm_experiment.py with JSON-config driven LLM training / generation
- add configs for gpt, gpt2, llama (training/generation)
- remove individual train/generate scripts for each model
- update README with simple how-to for experiments block

BREAKING CHANGE: all llm_only experiments now run only through run_llm_experiment.py; legacy scripts removed
2025-10-14 11:57:23 +03:00
Sergey Penkovsky
0cc7850848 fix: format code 2025-10-06 23:03:01 +03:00
Sergey Penkovsky
237b86421e doc: update docstring 2025-10-06 23:02:03 +03:00
Sergey Penkovsky
712278e33c Рефакторинг: единообразие оформления кода (пробелы, кавычки, пустые строки), без изменения логики по всему проекту. 2025-10-06 22:57:19 +03:00
Sergey Penkovsky
332cad6159 Merge pull request #2 from pese-git/feature/llama
Feature/llama
2025-10-06 22:05:45 +03:00
Sergey Penkovsky
2434d34188 docs: научная и практическая документация для всех ключевых модулей LLM
- Улучшены и дополнены docstring базовых компонентов (decoder, cached_decoder, multi_head_attention, head_attention, feed_forward, token_embeddings, positional_embeddings, gelu, silu, swi_glu, rope, rms_norm)
- На русском языке: объяснены алгоритмы архитектур, приведены формулы и ссылки на статьи
- Для всех моделей (GPT, GPT2, LLaMA) добавлены подробные описания классов, методов forward/generate, форматы входа/выхода
- Примеры использования в каждом ключевом классе
- Описаны научные концепции, архитектурные отличия и причины выбора решений
2025-10-06 21:59:55 +03:00
Sergey Penkovsky
73ee3e16ec docs: update and enhance documentation for all core components and models
- Added detailed documentation for GPT, GPT2 and LLaMA models
- Enhanced docstrings in base_model.py, rope.py, rms_norm.py, swi_glu.py
- Updated README with architectural differences and usage examples
- Added scientific references and mathematical foundations
- Improved type hints and parameter descriptions
2025-10-06 20:34:02 +03:00
Sergey Penkovsky
3bc2848cf0 refactor: unify CachedDecoder implementation across models
- Completely removed duplicate CachedDecoder from llama.py
- Modified core CachedDecoder to support dependency injection:
  - Added feed_forward_layer parameter (required)
  - Added norm_layer parameter with LayerNorm default
  - Added rope parameter for RoPE support
  - Removed unused activation parameter
- Updated GPT2 to use new CachedDecoder with FeedForward
- Updated LLaMA to use new CachedDecoder with SwiGLU and RMSNorm
- Fixed parameter order in constructor to follow Python syntax rules

This eliminates all code duplication while maintaining architectural specificities through dependency injection.
2025-10-06 14:57:29 +03:00
Sergey Penkovsky
d99d605b35 refactor: partial removal of duplicate code by using core modules
- Removed duplicate HeadAttention and MultiHeadAttention implementations from llama.py
- Now importing MultiHeadAttention from core module
- Added RoPE support parameter to core HeadAttention constructor
- Kept LLaMA-specific CachedDecoder implementation (uses SwiGLU and RMSNorm)
- Core CachedDecoder uses different components (FeedForward and LayerNorm)
- Improved code reuse for attention components while maintaining LLaMA-specific decoder

This is a partial refactor - attention components are now shared, but decoder remains LLaMA-specific due to different normalization and activation requirements.
2025-10-06 14:26:32 +03:00
Sergey Penkovsky
211adf574c refactor: extract LLaMA components to separate modules in core directory
- Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/
- Updated feed_forward.py to use new modular components
- Modified llama.py to import components from core modules instead of local definitions
- Improved code organization and reusability of activation functions and normalization layers

This refactor enables better code reuse across different model architectures and follows the single responsibility principle.
2025-10-06 14:09:19 +03:00
Sergey Penkovsky
f30cd530a9 feat: add LLaMA model implementation with RoPE positional encoding
- Added LLaMA model architecture with RMSNorm and SwiGLU activation
- Implemented Rotary Positional Embeddings (RoPE) for better positional encoding
- Created training script for LLaMA with BPE tokenizer
- Fixed matplotlib dependency version in uv.lock
- Added LLaMA module initialization

The implementation includes:
- TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support
- RMSNorm normalization layer
- SwiGLU feed-forward activation
- Cached decoder implementation for efficient generation
2025-10-06 13:26:20 +03:00
Sergey Penkovsky
9898e8ee83 feat: add RoPE positional embeddings implementation in llama.ipynb
- Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components
- Add vectorized computation of inverse frequencies for RoPE
- Include tensor slicing utilities for even/odd column separation
- Update dependencies in pyproject.toml and uv.lock
2025-10-06 12:52:59 +03:00
Sergey Penkovsky
b6f56a2640 fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update 2025-10-05 23:01:58 +03:00
Sergey Penkovsky
e5b5a97811 Merge pull request #1 from pese-git/feature/gpt2
Feature/gpt2
2025-10-05 21:30:33 +03:00
Sergey Penkovsky
b9d9bdcc71 docs(readme): add explicit support notice for GPT-2 architecture and usage examples 2025-10-05 21:29:38 +03:00
Sergey Penkovsky
c31eed8551 fix(hf-integration): handle logits as tuple in hf_adapter, convert torch.Tensor to list in hf_tokenizer.decode for decoding compatibility 2025-10-05 20:47:36 +03:00
Sergey Penkovsky
3843e64098 test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs 2025-10-05 19:26:18 +03:00
Sergey Penkovsky
c39e68d71a feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE. 2025-10-05 19:11:20 +03:00
Sergey Penkovsky
f866ed7ac7 fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility) 2025-10-05 15:52:21 +03:00
Sergey Penkovsky
aa408e941a docs: add GPT-2 analysis notebook
- Add gpt2.ipynb with GPT-2 model experiments and comparisons
2025-10-05 12:48:32 +03:00
Sergey Penkovsky
da1cf3fb55 fix: rename notebook 2025-10-05 12:46:17 +03:00
Sergey Penkovsky
1f9a4d2fa9 chore: add ipykernel dependency and update notebooks
- Add ipykernel to project dependencies for Jupyter notebook support
- Update BPE and GPT analysis notebooks with latest experiments
2025-10-05 11:59:24 +03:00
Sergey Penkovsky
f060497eb1 docs: add analysis notebooks for BPE and GPT
- Add bpe.ipynb with Byte Pair Encoding implementation analysis
- Update gpt_analysis.ipynb with GPT model experiments and visualizations
2025-10-05 08:23:09 +03:00
Sergey Penkovsky
fb74dc7c17 test: add comprehensive test suite for LLM components
- Add pytest configuration and fixtures
- Add tests for core modules: decoder, feed_forward, multi_head_attention
- Add tests for positional and token embeddings
- Add tests for GPT model
- Add tests for tokenizers (base and BPE)
- Add basic integration tests
2025-10-05 08:11:18 +03:00
Sergey Penkovsky
f4bdc81829 fix: update PyTorch mask types and BPE tokenizer serialization
- Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate
- Add save/load methods to BPETokenizer for proper merges and vocab_list serialization
- Update dependencies in pyproject.toml
2025-10-05 08:09:30 +03:00
Sergey Penkovsky
ec07546ea8 feat: initial project setup with LLM architecture and HF integration
- Add LLM library with GPT model implementation
- Add hf-proxy for HuggingFace integration
- Add experiments for training and generation
- Add comprehensive documentation and examples
- Configure uv workspace with proper dependencies
2025-10-04 22:40:21 +03:00