mirror of
https://github.com/pese-git/llm-arch-research.git
synced 2026-01-23 21:10:54 +00:00
- Add new core modules: GeGLU (Gated GELU Linear Unit), GemmaDecoder, MultiQueryAttention; all with highly detailed scientific (RU) docstrings: theory, usage, formulas, references
- Major doc improvements in Gemma model: class, __init__, forward, generate now have full educational/engineering docstrings, use-case samples, and literature links
- Add comprehensive unit tests:
* tests/core/test_geglu.py: GeGLU coverage (shape, grads, edge, repeat, float16/skip)
* tests/core/test_gemma_decoder.py: GemmaDecoder coverage (shape, mask, cache, repeatability, errors)
* tests/core/test_multi_query_attention.py: MQA coverage (shape, cache, gradients, masking, dropout, raise)
- All modules and tests follow strict quality/documentation standards, code is now robust for research & production