llm-arch-research

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00

Author	SHA1	Message	Date
Sergey Penkovsky	cfb4b6dfb1	feat(gemma): initial implementation of Gemma model and configs - Add core Gemma model (architecture, attention, GeGLU, RoPE, RMSNorm, etc) - Add configs for training and generation: gemma_train.json, gemma_generate.json - Add Gemma notebook for exploratory analysis and demonstration - Add __init__.py for Gemma submodule - Update run_llm_experiment.py to support Gemma experiment configs test(gemma): add comprehensive unit tests for Gemma - Test forward pass (with/without cache) - Test autoregressive generation (greedy, top-k, top-p) - Test shape correctness and max sequence length errors - Test multi-layer stack and token embeddings docs: add documentation notebook for Gemma usage and analysis Closes: #issue (if applicable)	2025-10-21 01:02:15 +03:00
Sergey Penkovsky	b1737bbce2	feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests - Add Mixtral architecture implementation with MoE support (llm/src/llm/models/mixtral/mixtral.py) - Introduce generic Mixture-of-Experts (MoE) block (llm/src/llm/core/moe.py) - Create dedicated configuration files for Mixtral training and generation experiments - Register and test Mixtral support in experiment runner (run_llm_experiment.py) - Add unit tests for Mixtral API including forward, caching, and generation modes - Include Jupyter notebook mixstral.ipynb for architectural exploration and research - Ensure correct handling of torch bool masks in sampling (top-k, top-p) during generation BREAKING CHANGE: Adds new model code and test coverage, modifying experiment runner logic to register Mixtral.	2025-10-20 08:12:11 +03:00
Sergey Penkovsky	613d784565	doc(datasets): update docstrings and tests	2025-10-17 10:49:45 +03:00
Sergey Penkovsky	ec0d2bd8d0	feat(mistral): add Mistral model implementation and configs - implement Mistral model in llm/models/mistral/mistral.py with GroupedQueryAttention, SwiGLU, RoPE, sliding window attention - add __init__.py for module export - add config files for mistral training and generation - update universal experiment runner to support Mistral model - add notebook for Mistral experiments	2025-10-14 14:53:45 +03:00
Sergey Penkovsky	3e4815fcc6	refactor(experiments): migrate to universal runner + config structure, remove legacy scripts - add universal runner run_llm_experiment.py with JSON-config driven LLM training / generation - add configs for gpt, gpt2, llama (training/generation) - remove individual train/generate scripts for each model - update README with simple how-to for experiments block BREAKING CHANGE: all llm_only experiments now run only through run_llm_experiment.py; legacy scripts removed	2025-10-14 11:57:23 +03:00

5 Commits