21 Commits

Author SHA1 Message Date
Sergey Penkovsky
25caf69ced refactor(gpt1): migrate Decoder to GptDecoder, unify API, and update tests
- Renamed Decoder (and decoder.py) to GptDecoder (gpt_decoder.py) for clarity in GPT1
- Implemented support for cache and use_cache parameters in GptDecoder.forward (API unification)
- Adapted all usages in GPT model to use new decoder structure and handle tuple output
- Refactored core tests (test_gpt.py, test_gpt_decoder.py, test_basic.py) to correctly expect tuple or logits and ensure shape/device checks work as before
- Improved clarity and future extensibility for autoregressive generation and benchmarking
- No changes to architectural details or training loop; pure API and test modernization
2025-10-22 16:27:08 +03:00
Sergey Penkovsky
ea932a36f3 feat(gemma): document and test GeGLU, MultiQueryAttention, GemmaDecoder, update Gemma model docs
- Add new core modules: GeGLU (Gated GELU Linear Unit), GemmaDecoder, MultiQueryAttention; all with highly detailed scientific (RU) docstrings: theory, usage, formulas, references
- Major doc improvements in Gemma model: class, __init__, forward, generate now have full educational/engineering docstrings, use-case samples, and literature links
- Add comprehensive unit tests:
    * tests/core/test_geglu.py: GeGLU coverage (shape, grads, edge, repeat, float16/skip)
    * tests/core/test_gemma_decoder.py: GemmaDecoder coverage (shape, mask, cache, repeatability, errors)
    * tests/core/test_multi_query_attention.py: MQA coverage (shape, cache, gradients, masking, dropout, raise)
- All modules and tests follow strict quality/documentation standards, code is now robust for research & production
2025-10-21 15:12:45 +03:00
Sergey Penkovsky
cfb4b6dfb1 feat(gemma): initial implementation of Gemma model and configs
- Add core Gemma model (architecture, attention, GeGLU, RoPE, RMSNorm, etc)
- Add configs for training and generation: gemma_train.json, gemma_generate.json
- Add Gemma notebook for exploratory analysis and demonstration
- Add __init__.py for Gemma submodule
- Update run_llm_experiment.py to support Gemma experiment configs

test(gemma): add comprehensive unit tests for Gemma

- Test forward pass (with/without cache)
- Test autoregressive generation (greedy, top-k, top-p)
- Test shape correctness and max sequence length errors
- Test multi-layer stack and token embeddings

docs: add documentation notebook for Gemma usage and analysis

Closes: #issue (if applicable)
2025-10-21 01:02:15 +03:00
Sergey Penkovsky
c9da4c841b feat(mixtral): add MixtralDecoder, enhance MoE and Mixtral model docs, add unit tests
- Implement new core module: MixtralDecoder (llm/core/mixtral_decoder.py) with full Russian scientific docstrings, formal math, and usage examples
- Improve MoE: add Russian docstrings for class, __init__, forward; validate top_k_experts; explain theory and components
- Refactor Mixtral model: switch stack to MixtralDecoder, add comprehensive documentation for class, constructor and forward, clarify config usage and architecture
- Add thorough unit tests:
   * tests/core/test_mixtral_decoder.py: checks shapes, errors, mask, dropout, grads etc.
   * tests/core/test_moe.py: covers normal and edge-case logic, gradients, shape, params check
- All code and tests in compliance with recent scientific and engineering standards.
2025-10-20 16:07:51 +03:00
Sergey Penkovsky
b1737bbce2 feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests
- Add Mixtral architecture implementation with MoE support (llm/src/llm/models/mixtral/mixtral.py)
- Introduce generic Mixture-of-Experts (MoE) block (llm/src/llm/core/moe.py)
- Create dedicated configuration files for Mixtral training and generation experiments
- Register and test Mixtral support in experiment runner (run_llm_experiment.py)
- Add unit tests for Mixtral API including forward, caching, and generation modes
- Include Jupyter notebook mixstral.ipynb for architectural exploration and research
- Ensure correct handling of torch bool masks in sampling (top-k, top-p) during generation

BREAKING CHANGE: Adds new model code and test coverage, modifying experiment runner logic to register Mixtral.
2025-10-20 08:12:11 +03:00
Sergey Penkovsky
d947b7beb3 update and expand scientific docstrings for optimizer, scheduler, trainer
- Expanded module-level and function/class docstrings in optimizer.py, scheduler.py, and trainer.py
- Described mathematical foundations, theoretical motivations, and provided detailed usage examples for students
- All docstrings in Russian, clear scientific style

test(training): add comprehensive tests for optimizer, scheduler, and trainer modules

- Added new test files for get_optimizer, get_linear_schedule_with_warmup, and Trainer
- Tests cover parameter handling, edge cases, and expected learning dynamics (lr schedules and loss behavior)
- Trainer now logs average epoch losses to self.loss_history for testability and analysis

refactor(training/trainer): log epoch loss to loss_history for downstream analysis and tests

BREAKING CHANGE: Trainer.loss_history is a new attribute consolidating average losses per epoch, enabling robust learning dynamics assertions in tests
2025-10-17 16:25:39 +03:00
Sergey Penkovsky
613d784565 doc(datasets): update docstrings and tests 2025-10-17 10:49:45 +03:00
Sergey Penkovsky
516f9580fb docs(core): add docstrings and unit tests for SwiGLU block
- docs: rewrite and expand docstrings for SwiGLU class and forward method (motivation, math, architecture, usage, references to LLaMA/Mistral/PaLM)
- test: add unit tests for SwiGLU (shape, dtype, gradients, output range, fp16 support, reproducibility)
- strictly doc/tests, no logic or API changes

This improves transparency and reliability for gated FFN blocks in transformer architectures.
2025-10-16 15:09:09 +03:00
Sergey Penkovsky
64d33783e0 docs(core): add docstrings and unit tests for SiLU activation
- docs: expand and clarify docstrings for SiLU class and its method (mathematical formula, motivation, properties vs ReLU/GELU, usage, and references to Swish/LLM papers)
- test: add unit tests for SiLU (shape/dtype, behavior on large/small values, PyTorch reference, gradients, broadcast)
- no logic/API changes

This update improves reliability and usability of the SiLU activation module.
2025-10-16 14:48:50 +03:00
Sergey Penkovsky
6efc946027 docs(core): expand docstrings and add unit tests for RMSNorm
- docs: update/increase docstring detail for RMSNorm class and methods (motivation, formula, architecture, usage, references to LLaMA/PaLM/GPT)
- test: add comprehensive unit tests for RMSNorm (shape/type preservation, rms scaling, gradients for input and weights, fp16, large eps stability)

No code/API changes beyond docs and new tests.
2025-10-16 14:37:25 +03:00
Sergey Penkovsky
0832d78acf docs(core): improve docstrings and add unit tests for GELU activation
- docs: rewrite and expand docstrings for GELU class and method (motivation, math formula, smoother ReLU for Transformers, usage, references)
- test: add dedicated tests for GELU (output shape, dtype, comparison with torch GELU, monotonicity, gradients, large/small value behavior)
- fix: align numerical test to allow for minor approximation difference vs PyTorch gelu

This update makes the GELU module more transparent and robust for deep learning practitioners and researchers.
2025-10-16 13:59:38 +03:00
Sergey Penkovsky
923aa51e2a docs(core): add docstrings and unit tests for CachedDecoder module
- docs: Add detailed docstrings for CachedDecoder class and its methods (__init__, forward); explain autoregressive caching, architecture, math, usage, and links to GPT-2/LLM references
- test: Add comprehensive unit tests for CachedDecoder (initialization, forward with and without cache, cache chaining, output shape, error on long input, backward pass)
- These changes improve code clarity, reliability, and testing for decoder blocks with KV cache.
2025-10-16 12:30:53 +03:00
Sergey Penkovsky
ba3b04cec2 docs(core): add docstrings and unit tests for MistralDecoder
- docs: expanded docstrings for MistralDecoder class and methods (__init__, forward); explained architecture, key parameters, usage, and links to relevant papers (Mistral, Llama 2)
- test: add comprehensive unit tests for MistralDecoder (init, forward, cache handling, output shape, shape errors, backward)
- These changes improve explainability, reliability, and test coverage for the decoder module.
2025-10-15 18:07:11 +03:00
Sergey Penkovsky
e6ca8dee6f docs(core): add comprehensive docstrings and unit tests for GroupedQueryAttention (GQA)
- docs: Rewrite and expand docstrings for the GroupedQueryAttention class and all main methods (__init__, forward, _repeat_kv_heads, _create_sliding_window_mask):
    - explained GQA architecture and motivation
    - included mathematical formulas, step-by-step algorithms, usage examples
    - added references to relevant scientific papers (Mistral, Llama 2, etc.)
- test: Add dedicated unit tests for GQA (output shape correctness, mask/window logic, KV head replication, RoPE processing, error and edge-cases)
- docs/test: Documentation and tests now fully reflect modern GQA usage and best practices for LLM architectures

This commit makes the implementation, usage, and theoretical underpinnings of GQA transparent and reproducible for researchers and engineers.
2025-10-15 17:27:55 +03:00
Sergey Penkovsky
2e72dbaf07 test(llama): add unit tests for generation, cache, and edge cases
- Covers inference with and without cache and with sampling (top-k, top-p)
- Includes test for max sequence length (should raise ValueError)
- Verifies output shape and absence of dtype errors for the mask logic
- Minimal config and random data ensure tests are fast and robust

Motivation: Regression and integration protection for Llama decoding and sampling logic.
2025-10-15 14:37:35 +03:00
Sergey Penkovsky
dc440a3938 test(gpt2): add unit tests for generation, cache behavior, and error conditions
- Covers forward pass with and without KV-cache
- Verifies correct sequence generation for greedy, top-k, and top-p sampling
- Adds ValueError test for exceeding max sequence length
- Uses small random toy config and minimal setup for fast test feedback

Motivation: Prevent regressions in decoding, sampling, and KV-cache logic in GPT2 implementation.
2025-10-15 14:36:32 +03:00
Sergey Penkovsky
38682e8c9d test(mistral): add unit tests for model generation and cache 2025-10-15 13:20:50 +03:00
Sergey Penkovsky
d10044e4a7 refactor(core): refactor RoPE and MultiHeadAttention, add math-rich docs, expand tests, remove unused head_attention
- refactor: улучшена и унифицирована реализация RoPE, теперь поддерживаются строгие проверки размерности входа; внесены улучшения и структурные изменения в MultiHeadAttention (более понятная логика, строгая спецификация входов/выходов)
- docs: полностью переписаны docstrings для RoPE и MultiHeadAttention — включены математические формулы, ссылки на научные статьи, подробные пояснения по алгоритму, формату входных данных, ограничениям, примеры использования
- test: добавлены отдельные unit-тесты для RoPE (корректность формы, ошибки на неверную размерность, сохранение нормы, backward/градиенты, работу с параметрами start_pos и батчами)
- chore: удалён неиспользуемый модуль core/head_attention.py
- fix: теперь выбрасывается AssertionError при неправильной размерности входа RoPE; это позволило полностью покрыть тест-кейсы на ошибки

Этот коммит синхронизирует логику реализации базового внимания с современной практикой LLM, укрепляет документацию для инженеров и исследователей, а также расширяет надежность автотестирования библиотеки.
2025-10-15 11:04:07 +03:00
Sergey Penkovsky
712278e33c Рефакторинг: единообразие оформления кода (пробелы, кавычки, пустые строки), без изменения логики по всему проекту. 2025-10-06 22:57:19 +03:00
Sergey Penkovsky
3843e64098 test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs 2025-10-05 19:26:18 +03:00
Sergey Penkovsky
fb74dc7c17 test: add comprehensive test suite for LLM components
- Add pytest configuration and fixtures
- Add tests for core modules: decoder, feed_forward, multi_head_attention
- Add tests for positional and token embeddings
- Add tests for GPT model
- Add tests for tokenizers (base and BPE)
- Add basic integration tests
2025-10-05 08:11:18 +03:00