llm-arch-research

penkovsky_sa/llm-arch-research

Fork 0

mirror of https://github.com/pese-git/llm-arch-research.git synced 2026-01-23 13:00:54 +00:00

Commit Graph

Select branches

Hide Pull Requests

feature/gemma

feature/gpt2

feature/llama

feature/mistral

feature/mixtral

master

ref/gpt1

ref/gpt2

#1

#2

#3

#4

#5

#6

db0ab511d1 feat(gpt2): add Gpt2Decoder module, refactor model and add tests ref/gpt2 Sergey Penkovsky 2025-10-31 15:35:54 +03:00
7744658716 Merge pull request #6 from pese-git/ref/gpt1 master Sergey Penkovsky 2025-10-31 09:15:54 +03:00
21cfd79c19 refactor(assets): update and reorganize GPT-1 architecture diagrams ref/gpt1 Sergey Penkovsky 2025-10-30 14:40:31 +03:00
9e2796e6be docs(gpt1): add architecture diagrams and notebook updates Sergey Penkovsky 2025-10-24 17:42:11 +03:00
25caf69ced refactor(gpt1): migrate Decoder to GptDecoder, unify API, and update tests Sergey Penkovsky 2025-10-22 16:27:08 +03:00
ddc4924a37 refactor(models): unify generate() signatures across all LLM architectures\n\n- Unified method signature: (x, max_new_tokens, do_sample, temperature, top_k, top_p, use_cache, attention_mask, **kwargs)\n- Added del attention_mask, kwargs in every generate() for compatibility and clean API\n- Prepared for drop-in replacement and ease of future batching/serving\n\nNo changes to core model logic or sampling algorithms. Sergey Penkovsky 2025-10-22 11:57:26 +03:00
92a34551b8 Merge pull request #5 from pese-git/feature/gemma Sergey Penkovsky 2025-10-21 17:53:55 +03:00
ea932a36f3 feat(gemma): document and test GeGLU, MultiQueryAttention, GemmaDecoder, update Gemma model docs feature/gemma Sergey Penkovsky 2025-10-21 15:12:45 +03:00
cfb4b6dfb1 feat(gemma): initial implementation of Gemma model and configs Sergey Penkovsky 2025-10-21 01:02:15 +03:00
58c4a00b48 Merge pull request #4 from pese-git/feature/mixtral Sergey Penkovsky 2025-10-20 16:36:39 +03:00
c9da4c841b feat(mixtral): add MixtralDecoder, enhance MoE and Mixtral model docs, add unit tests feature/mixtral Sergey Penkovsky 2025-10-20 16:07:51 +03:00
b1737bbce2 feat(mixtral): initial implementation of Mixtral MoE model, configs, and tests Sergey Penkovsky 2025-10-20 08:12:11 +03:00
1aba02cab9 Merge pull request #3 from pese-git/feature/mistral Sergey Penkovsky 2025-10-17 20:45:20 +03:00
9794db3e18 docs(readme): update project documentation for LLaMA, Mistral, HF integration feature/mistral Sergey Penkovsky 2025-10-17 20:18:57 +03:00
d947b7beb3 update and expand scientific docstrings for optimizer, scheduler, trainer Sergey Penkovsky 2025-10-17 16:23:43 +03:00
613d784565 doc(datasets): update docstrings and tests Sergey Penkovsky 2025-10-17 10:49:45 +03:00
38c271ca3c docs(models): update and expand docstrings for Mistral and its methods Sergey Penkovsky 2025-10-16 17:01:57 +03:00
aec3c8adb6 docs(models): update and expand docstrings for LLaMA and generate method Sergey Penkovsky 2025-10-16 16:55:14 +03:00
90eb2f4467 docs(models): expand docstring for generate method in GPT2 Sergey Penkovsky 2025-10-16 16:43:27 +03:00
a3415d404a docs(models): update References in GPT docstring for vanilla implementation Sergey Penkovsky 2025-10-16 16:33:53 +03:00
9837ea3c3d docs(tokenizer): expand docstrings for BpeTokenizer Sergey Penkovsky 2025-10-16 15:26:17 +03:00
baafca0546 docs(core): update docstrings for TokenEmbeddings Sergey Penkovsky 2025-10-16 15:14:53 +03:00
516f9580fb docs(core): add docstrings and unit tests for SwiGLU block Sergey Penkovsky 2025-10-16 15:09:09 +03:00
64d33783e0 docs(core): add docstrings and unit tests for SiLU activation Sergey Penkovsky 2025-10-16 14:48:50 +03:00
6efc946027 docs(core): expand docstrings and add unit tests for RMSNorm Sergey Penkovsky 2025-10-16 14:37:25 +03:00
8018efae2a docs(core): expand docstrings for PositionalEmbeddings module Sergey Penkovsky 2025-10-16 14:09:05 +03:00
0832d78acf docs(core): improve docstrings and add unit tests for GELU activation Sergey Penkovsky 2025-10-16 13:59:38 +03:00
c338556cfe docs(core): improve and expand docstrings for FeedForward module Sergey Penkovsky 2025-10-16 12:47:47 +03:00
3a356f5d79 docs(core): improve and expand docstrings for Decoder module Sergey Penkovsky 2025-10-16 12:40:46 +03:00
923aa51e2a docs(core): add docstrings and unit tests for CachedDecoder module Sergey Penkovsky 2025-10-16 12:30:53 +03:00
ba3b04cec2 docs(core): add docstrings and unit tests for MistralDecoder Sergey Penkovsky 2025-10-15 18:07:11 +03:00
e6ca8dee6f docs(core): add comprehensive docstrings and unit tests for GroupedQueryAttention (GQA) Sergey Penkovsky 2025-10-15 17:27:55 +03:00
2e72dbaf07 test(llama): add unit tests for generation, cache, and edge cases Sergey Penkovsky 2025-10-15 14:37:35 +03:00
dc440a3938 test(gpt2): add unit tests for generation, cache behavior, and error conditions Sergey Penkovsky 2025-10-15 14:36:32 +03:00
50d7593023 fix(gpt2, llama): proper top-k/top-p mask handling in sampling for PyTorch compatibility (bool/uint8) Sergey Penkovsky 2025-10-15 14:35:10 +03:00
38682e8c9d test(mistral): add unit tests for model generation and cache Sergey Penkovsky 2025-10-15 13:20:50 +03:00
e791f7cd93 fix(mistral): fix top-k/top-p mask handling for PyTorch >=1.2 Sergey Penkovsky 2025-10-15 13:20:30 +03:00
d10044e4a7 refactor(core): refactor RoPE and MultiHeadAttention, add math-rich docs, expand tests, remove unused head_attention Sergey Penkovsky 2025-10-15 10:59:56 +03:00
ec0d2bd8d0 feat(mistral): add Mistral model implementation and configs Sergey Penkovsky 2025-10-14 14:53:45 +03:00
e5706a690d fix(rope, attention): корректное позиционирование RoPE при генерации с кэшем Sergey Penkovsky 2025-10-14 12:03:20 +03:00
3e4815fcc6 refactor(experiments): migrate to universal runner + config structure, remove legacy scripts Sergey Penkovsky 2025-10-14 11:57:23 +03:00
0cc7850848 fix: format code Sergey Penkovsky 2025-10-06 23:03:01 +03:00
237b86421e doc: update docstring Sergey Penkovsky 2025-10-06 23:02:03 +03:00
712278e33c Рефакторинг: единообразие оформления кода (пробелы, кавычки, пустые строки), без изменения логики по всему проекту. Sergey Penkovsky 2025-10-06 22:57:19 +03:00
332cad6159 Merge pull request #2 from pese-git/feature/llama Sergey Penkovsky 2025-10-06 22:05:45 +03:00
2434d34188 docs: научная и практическая документация для всех ключевых модулей LLM feature/llama Sergey Penkovsky 2025-10-06 21:59:55 +03:00
73ee3e16ec docs: update and enhance documentation for all core components and models Sergey Penkovsky 2025-10-06 20:34:02 +03:00
3bc2848cf0 refactor: unify CachedDecoder implementation across models Sergey Penkovsky 2025-10-06 14:57:29 +03:00
d99d605b35 refactor: partial removal of duplicate code by using core modules Sergey Penkovsky 2025-10-06 14:22:44 +03:00
211adf574c refactor: extract LLaMA components to separate modules in core directory Sergey Penkovsky 2025-10-06 14:09:19 +03:00
f30cd530a9 feat: add LLaMA model implementation with RoPE positional encoding Sergey Penkovsky 2025-10-06 13:26:20 +03:00
9898e8ee83 feat: add RoPE positional embeddings implementation in llama.ipynb Sergey Penkovsky 2025-10-06 12:52:59 +03:00
b6f56a2640 fix: typo in activation attribute for SwiGLU (rename _actvation to _activation) and minor index update Sergey Penkovsky 2025-10-05 23:01:58 +03:00
e5b5a97811 Merge pull request #1 from pese-git/feature/gpt2 Sergey Penkovsky 2025-10-05 21:30:33 +03:00
b9d9bdcc71 docs(readme): add explicit support notice for GPT-2 architecture and usage examples feature/gpt2 Sergey Penkovsky 2025-10-05 21:29:38 +03:00
c31eed8551 fix(hf-integration): handle logits as tuple in hf_adapter, convert torch.Tensor to list in hf_tokenizer.decode for decoding compatibility Sergey Penkovsky 2025-10-05 20:47:36 +03:00
3843e64098 test(core): fix FeedForward and MultiHeadAttention tests for unified interface and tuple outputs Sergey Penkovsky 2025-10-05 19:26:18 +03:00
c39e68d71a feat(gpt2): add GPT2 architecture with universal FeedForward, CachedDecoder, and refactored components. Core modules now shared; add train and generate scripts for GPT2-BPE. Sergey Penkovsky 2025-10-05 19:11:20 +03:00
f866ed7ac7 fix: universal logits extraction for tuple/model output in Trainer (GPT/GPT2 compatibility) Sergey Penkovsky 2025-10-05 15:52:21 +03:00
aa408e941a docs: add GPT-2 analysis notebook Sergey Penkovsky 2025-10-05 12:48:32 +03:00
da1cf3fb55 fix: rename notebook Sergey Penkovsky 2025-10-05 12:46:17 +03:00
1f9a4d2fa9 chore: add ipykernel dependency and update notebooks Sergey Penkovsky 2025-10-05 11:59:24 +03:00
f060497eb1 docs: add analysis notebooks for BPE and GPT Sergey Penkovsky 2025-10-05 08:23:09 +03:00
fb74dc7c17 test: add comprehensive test suite for LLM components Sergey Penkovsky 2025-10-05 08:11:18 +03:00
f4bdc81829 fix: update PyTorch mask types and BPE tokenizer serialization Sergey Penkovsky 2025-10-05 08:09:30 +03:00
ec07546ea8 feat: initial project setup with LLM architecture and HF integration Sergey Penkovsky 2025-10-04 22:40:21 +03:00

Commit Graph Select branches Hide Pull Requests feature/gemma feature/gpt2 feature/llama feature/mistral feature/mixtral master ref/gpt1 ref/gpt2 #1 #2 #3 #4 #5 #6 Mono Color

Commit Graph

Select branches

Hide Pull Requests

feature/gemma

feature/gpt2

feature/llama

feature/mistral

feature/mixtral

master

ref/gpt1

ref/gpt2

#1

#2

#3

#4

#5

#6