- docs: update/increase docstring detail for RMSNorm class and methods (motivation, formula, architecture, usage, references to LLaMA/PaLM/GPT)
- test: add comprehensive unit tests for RMSNorm (shape/type preservation, rms scaling, gradients for input and weights, fp16, large eps stability)
No code/API changes beyond docs and new tests.
- docs: rewrite and expand docstrings for GELU class and method (motivation, math formula, smoother ReLU for Transformers, usage, references)
- test: add dedicated tests for GELU (output shape, dtype, comparison with torch GELU, monotonicity, gradients, large/small value behavior)
- fix: align numerical test to allow for minor approximation difference vs PyTorch gelu
This update makes the GELU module more transparent and robust for deep learning practitioners and researchers.
- docs: Add detailed docstrings for CachedDecoder class and its methods (__init__, forward); explain autoregressive caching, architecture, math, usage, and links to GPT-2/LLM references
- test: Add comprehensive unit tests for CachedDecoder (initialization, forward with and without cache, cache chaining, output shape, error on long input, backward pass)
- These changes improve code clarity, reliability, and testing for decoder blocks with KV cache.
- docs: expanded docstrings for MistralDecoder class and methods (__init__, forward); explained architecture, key parameters, usage, and links to relevant papers (Mistral, Llama 2)
- test: add comprehensive unit tests for MistralDecoder (init, forward, cache handling, output shape, shape errors, backward)
- These changes improve explainability, reliability, and test coverage for the decoder module.
- docs: Rewrite and expand docstrings for the GroupedQueryAttention class and all main methods (__init__, forward, _repeat_kv_heads, _create_sliding_window_mask):
- explained GQA architecture and motivation
- included mathematical formulas, step-by-step algorithms, usage examples
- added references to relevant scientific papers (Mistral, Llama 2, etc.)
- test: Add dedicated unit tests for GQA (output shape correctness, mask/window logic, KV head replication, RoPE processing, error and edge-cases)
- docs/test: Documentation and tests now fully reflect modern GQA usage and best practices for LLM architectures
This commit makes the implementation, usage, and theoretical underpinnings of GQA transparent and reproducible for researchers and engineers.
- Covers inference with and without cache and with sampling (top-k, top-p)
- Includes test for max sequence length (should raise ValueError)
- Verifies output shape and absence of dtype errors for the mask logic
- Minimal config and random data ensure tests are fast and robust
Motivation: Regression and integration protection for Llama decoding and sampling logic.
- Covers forward pass with and without KV-cache
- Verifies correct sequence generation for greedy, top-k, and top-p sampling
- Adds ValueError test for exceeding max sequence length
- Uses small random toy config and minimal setup for fast test feedback
Motivation: Prevent regressions in decoding, sampling, and KV-cache logic in GPT2 implementation.
- refactor: улучшена и унифицирована реализация RoPE, теперь поддерживаются строгие проверки размерности входа; внесены улучшения и структурные изменения в MultiHeadAttention (более понятная логика, строгая спецификация входов/выходов)
- docs: полностью переписаны docstrings для RoPE и MultiHeadAttention — включены математические формулы, ссылки на научные статьи, подробные пояснения по алгоритму, формату входных данных, ограничениям, примеры использования
- test: добавлены отдельные unit-тесты для RoPE (корректность формы, ошибки на неверную размерность, сохранение нормы, backward/градиенты, работу с параметрами start_pos и батчами)
- chore: удалён неиспользуемый модуль core/head_attention.py
- fix: теперь выбрасывается AssertionError при неправильной размерности входа RoPE; это позволило полностью покрыть тест-кейсы на ошибки
Этот коммит синхронизирует логику реализации базового внимания с современной практикой LLM, укрепляет документацию для инженеров и исследователей, а также расширяет надежность автотестирования библиотеки.