Logo
Explore Help
Sign In
penkovsky_sa/llm-arch-research
1
0
Fork 0
You've already forked llm-arch-research
mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 21:10:54 +00:00
Code Issues Packages Projects Releases Wiki Activity
Files
7744658716d917cbbce5606b18af083781142c57
llm-arch-research/experiments/llm_only/configs/mixtral_generate.json

19 lines
505 B
JSON
Raw Normal View History

feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests - Add Mixtral architecture implementation with MoE support (llm/src/llm/models/mixtral/mixtral.py) - Introduce generic Mixture-of-Experts (MoE) block (llm/src/llm/core/moe.py) - Create dedicated configuration files for Mixtral training and generation experiments - Register and test Mixtral support in experiment runner (run_llm_experiment.py) - Add unit tests for Mixtral API including forward, caching, and generation modes - Include Jupyter notebook mixstral.ipynb for architectural exploration and research - Ensure correct handling of torch bool masks in sampling (top-k, top-p) during generation BREAKING CHANGE: Adds new model code and test coverage, modifying experiment runner logic to register Mixtral.
2025-10-20 08:12:11 +03:00
{
"bpe_tokenizer": "checkpoints/bpe_tokenizer.json",
"test_prompts": [
"Open weights",
"The Llama model is",
"Efficient transformers"
],
"model_config_path": "checkpoints/mixtral-bpe/config.json",
"model_weights": "checkpoints/mixtral-bpe/model.pt",
"generation": {
"max_new_tokens": 40,
"temperature": 0.8,
"do_sample": true,
"top_k": null,
"top_p": null
},
"log_path": "checkpoints/mixtral_only_generation_logs.json"
}
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.24.5 Page: 29ms Template: 3ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API