llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	e6ca8dee6f	docs(core): add comprehensive docstrings and unit tests for GroupedQueryAttention (GQA) - docs: Rewrite and expand docstrings for the GroupedQueryAttention class and all main methods (__init__, forward, _repeat_kv_heads, _create_sliding_window_mask): - explained GQA architecture and motivation - included mathematical formulas, step-by-step algorithms, usage examples - added references to relevant scientific papers (Mistral, Llama 2, etc.) - test: Add dedicated unit tests for GQA (output shape correctness, mask/window logic, KV head replication, RoPE processing, error and edge-cases) - docs/test: Documentation and tests now fully reflect modern GQA usage and best practices for LLM architectures This commit makes the implementation, usage, and theoretical underpinnings of GQA transparent and reproducible for researchers and engineers.	2025-10-15 17:27:55 +03:00
Sergey Penkovsky	2e72dbaf07	test(llama): add unit tests for generation, cache, and edge cases - Covers inference with and without cache and with sampling (top-k, top-p) - Includes test for max sequence length (should raise ValueError) - Verifies output shape and absence of dtype errors for the mask logic - Minimal config and random data ensure tests are fast and robust Motivation: Regression and integration protection for Llama decoding and sampling logic.	2025-10-15 14:37:35 +03:00
Sergey Penkovsky	dc440a3938	test(gpt2): add unit tests for generation, cache behavior, and error conditions - Covers forward pass with and without KV-cache - Verifies correct sequence generation for greedy, top-k, and top-p sampling - Adds ValueError test for exceeding max sequence length - Uses small random toy config and minimal setup for fast test feedback Motivation: Prevent regressions in decoding, sampling, and KV-cache logic in GPT2 implementation.	2025-10-15 14:36:32 +03:00
Sergey Penkovsky	50d7593023	fix(gpt2, llama): proper top-k/top-p mask handling in sampling for PyTorch compatibility (bool/uint8) - Refactored token selection logic in methods of GPT2 and Llama classes. - Masks are now created with dtype=torch.bool (or torch.uint8 for legacy PyTorch). - Used True/False for mask/scatter instead of 1/0, ensuring correctness across PyTorch versions. - Fixed RuntimeError: masked_fill_ only supports boolean masks, previously raised by uint8-masks in new PyTorch. - Backward compatibility maintained: code works on PyTorch >=1.2 and for old clusters (via the else branch). Motivation: Fixes sampling errors for all modern PyTorch users while keeping research code usable on old infra.	2025-10-15 14:35:10 +03:00
Sergey Penkovsky	38682e8c9d	test(mistral): add unit tests for model generation and cache	2025-10-15 13:20:50 +03:00
Sergey Penkovsky	e791f7cd93	fix(mistral): fix top-k/top-p mask handling for PyTorch >=1.2	2025-10-15 13:20:30 +03:00
Sergey Penkovsky	d10044e4a7	refactor(core): refactor RoPE and MultiHeadAttention, add math-rich docs, expand tests, remove unused head_attention - refactor: улучшена и унифицирована реализация RoPE, теперь поддерживаются строгие проверки размерности входа; внесены улучшения и структурные изменения в MultiHeadAttention (более понятная логика, строгая спецификация входов/выходов) - docs: полностью переписаны docstrings для RoPE и MultiHeadAttention — включены математические формулы, ссылки на научные статьи, подробные пояснения по алгоритму, формату входных данных, ограничениям, примеры использования - test: добавлены отдельные unit-тесты для RoPE (корректность формы, ошибки на неверную размерность, сохранение нормы, backward/градиенты, работу с параметрами start_pos и батчами) - chore: удалён неиспользуемый модуль core/head_attention.py - fix: теперь выбрасывается AssertionError при неправильной размерности входа RoPE; это позволило полностью покрыть тест-кейсы на ошибки Этот коммит синхронизирует логику реализации базового внимания с современной практикой LLM, укрепляет документацию для инженеров и исследователей, а также расширяет надежность автотестирования библиотеки.	2025-10-15 11:04:07 +03:00
Sergey Penkovsky	ec0d2bd8d0	feat(mistral): add Mistral model implementation and configs - implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention - add __init__.py for module export - add config files for mistral training and generation - update universal experiment runner to support Mistral model - add notebook for Mistral experiments	2025-10-14 14:53:45 +03:00
Sergey Penkovsky	e5706a690d	fix(rope, attention): корректное позиционирование RoPE при генерации с кэшем - Исправлена ошибка расчёта позиции для RoPE (Rotary Positional Embeddings) при автодополнении с использованием кэша. - В HeadAttention теперь передаётся start_pos в RoPE, вычисляемый из длины кэша. - Обновлена сигнатура и логика метода RoPE.forward. - Обновлен ноутбук llama.ipynb под новые интерфейсы и выводы. BREAKING CHANGE: переопределён метод forward у RoPE, требуется обновить код, если RoPE использовался вручную.	2025-10-14 12:03:20 +03:00
Sergey Penkovsky	3e4815fcc6	refactor(experiments): migrate to universal runner + config structure, remove legacy scripts - add universal runner run_llm_experiment.py with JSON-config driven LLM training / generation - add configs for gpt, gpt2, llama (training/generation) - remove individual train/generate scripts for each model - update README with simple how-to for experiments block BREAKING CHANGE: all llm_only experiments now run only through run_llm_experiment.py; legacy scripts removed	2025-10-14 11:57:23 +03:00
Sergey Penkovsky	0cc7850848	fix: format code	2025-10-06 23:03:01 +03:00
Sergey Penkovsky	237b86421e	doc: update docstring	2025-10-06 23:02:03 +03:00
Sergey Penkovsky	712278e33c	Рефакторинг: единообразие оформления кода (пробелы, кавычки, пустые строки), без изменения логики по всему проекту.	2025-10-06 22:57:19 +03:00
Sergey Penkovsky	332cad6159	Merge pull request #2 from pese-git/feature/llama Feature/llama	2025-10-06 22:05:45 +03:00
Sergey Penkovsky	2434d34188	docs: научная и практическая документация для всех ключевых модулей LLM - Улучшены и дополнены docstring базовых компонентов (decoder, cached_decoder, multi_head_attention, head_attention, feed_forward, token_embeddings, positional_embeddings, gelu, silu, swi_glu, rope, rms_norm) - На русском языке: объяснены алгоритмы архитектур, приведены формулы и ссылки на статьи - Для всех моделей (GPT, GPT2, LLaMA) добавлены подробные описания классов, методов forward/generate, форматы входа/выхода - Примеры использования в каждом ключевом классе - Описаны научные концепции, архитектурные отличия и причины выбора решений	2025-10-06 21:59:55 +03:00
Sergey Penkovsky	73ee3e16ec	docs: update and enhance documentation for all core components and models - Added detailed documentation for GPT, GPT2 and LLaMA models - Enhanced docstrings in base_model.py, rope.py, rms_norm.py, swi_glu.py - Updated README with architectural differences and usage examples - Added scientific references and mathematical foundations - Improved type hints and parameter descriptions	2025-10-06 20:34:02 +03:00
Sergey Penkovsky	3bc2848cf0	refactor: unify CachedDecoder implementation across models - Completely removed duplicate CachedDecoder from llama.py - Modified core CachedDecoder to support dependency injection: - Added feed_forward_layer parameter (required) - Added norm_layer parameter with LayerNorm default - Added rope parameter for RoPE support - Removed unused activation parameter - Updated GPT2 to use new CachedDecoder with FeedForward - Updated LLaMA to use new CachedDecoder with SwiGLU and RMSNorm - Fixed parameter order in constructor to follow Python syntax rules This eliminates all code duplication while maintaining architectural specificities through dependency injection.	2025-10-06 14:57:29 +03:00
Sergey Penkovsky	d99d605b35	refactor: partial removal of duplicate code by using core modules - Removed duplicate HeadAttention and MultiHeadAttention implementations from llama.py - Now importing MultiHeadAttention from core module - Added RoPE support parameter to core HeadAttention constructor - Kept LLaMA-specific CachedDecoder implementation (uses SwiGLU and RMSNorm) - Core CachedDecoder uses different components (FeedForward and LayerNorm) - Improved code reuse for attention components while maintaining LLaMA-specific decoder This is a partial refactor - attention components are now shared, but decoder remains LLaMA-specific due to different normalization and activation requirements.	2025-10-06 14:26:32 +03:00
Sergey Penkovsky	211adf574c	refactor: extract LLaMA components to separate modules in core directory - Moved GELU, RMSNorm, RoPE, SiLU, and SwiGLU implementations from llama.py to dedicated files in core/ - Updated feed_forward.py to use new modular components - Modified llama.py to import components from core modules instead of local definitions - Improved code organization and reusability of activation functions and normalization layers This refactor enables better code reuse across different model architectures and follows the single responsibility principle.	2025-10-06 14:09:19 +03:00
Sergey Penkovsky	f30cd530a9	feat: add LLaMA model implementation with RoPE positional encoding - Added LLaMA model architecture with RMSNorm and SwiGLU activation - Implemented Rotary Positional Embeddings (RoPE) for better positional encoding - Created training script for LLaMA with BPE tokenizer - Fixed matplotlib dependency version in uv.lock - Added LLaMA module initialization The implementation includes: - TokenEmbeddings, HeadAttention, MultiHeadAttention with RoPE support - RMSNorm normalization layer - SwiGLU feed-forward activation - Cached decoder implementation for efficient generation	2025-10-06 13:26:20 +03:00
Sergey Penkovsky	9898e8ee83	feat: add RoPE positional embeddings implementation in llama.ipynb - Implement Rotary Positional Embeddings (RoPE) with separate cosine/sine components - Add vectorized computation of inverse frequencies for RoPE - Include tensor slicing utilities for even/odd column separation - Update dependencies in pyproject.toml and uv.lock	2025-10-06 12:52:59 +03:00
Sergey Penkovsky	b6f56a2640	fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update	2025-10-05 23:01:58 +03:00
Sergey Penkovsky	e5b5a97811	Merge pull request #1 from pese-git/feature/gpt2 Feature/gpt2	2025-10-05 21:30:33 +03:00
Sergey Penkovsky	b9d9bdcc71	docs(readme): add explicit support notice for GPT-2 architecture and usage examples	2025-10-05 21:29:38 +03:00
Sergey Penkovsky	c31eed8551	fix(hf-integration): handle logits as tuple in hf_adapter, convert torch.Tensor to list in hf_tokenizer.decode for decoding compatibility	2025-10-05 20:47:36 +03:00
Sergey Penkovsky	3843e64098	test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs	2025-10-05 19:26:18 +03:00
Sergey Penkovsky	c39e68d71a	feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE.	2025-10-05 19:11:20 +03:00
Sergey Penkovsky	f866ed7ac7	fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility)	2025-10-05 15:52:21 +03:00
Sergey Penkovsky	aa408e941a	docs: add GPT-2 analysis notebook - Add gpt2.ipynb with GPT-2 model experiments and comparisons	2025-10-05 12:48:32 +03:00
Sergey Penkovsky	da1cf3fb55	fix: rename notebook	2025-10-05 12:46:17 +03:00
Sergey Penkovsky	1f9a4d2fa9	chore: add ipykernel dependency and update notebooks - Add ipykernel to project dependencies for Jupyter notebook support - Update BPE and GPT analysis notebooks with latest experiments	2025-10-05 11:59:24 +03:00
Sergey Penkovsky	f060497eb1	docs: add analysis notebooks for BPE and GPT - Add bpe.ipynb with Byte Pair Encoding implementation analysis - Update gpt_analysis.ipynb with GPT model experiments and visualizations	2025-10-05 08:23:09 +03:00
Sergey Penkovsky	fb74dc7c17	test: add comprehensive test suite for LLM components - Add pytest configuration and fixtures - Add tests for core modules: decoder, feed_forward, multi_head_attention - Add tests for positional and token embeddings - Add tests for GPT model - Add tests for tokenizers (base and BPE) - Add basic integration tests	2025-10-05 08:11:18 +03:00
Sergey Penkovsky	f4bdc81829	fix: update PyTorch mask types and BPE tokenizer serialization - Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate - Add save/load methods to BPETokenizer for proper merges and vocab_list serialization - Update dependencies in pyproject.toml	2025-10-05 08:09:30 +03:00
Sergey Penkovsky	ec07546ea8	feat: initial project setup with LLM architecture and HF integration - Add LLM library with GPT model implementation - Add hf-proxy for HuggingFace integration - Add experiments for training and generation - Add comprehensive documentation and examples - Configure uv workspace with proper dependencies	2025-10-04 22:40:21 +03:00

35 Commits