Files
llm-arch-research/llm
Sergey Penkovsky d99d605b35 refactor: partial removal of duplicate code by using core modules
- Removed duplicate HeadAttention and MultiHeadAttention implementations from llama.py
- Now importing MultiHeadAttention from core module
- Added RoPE support parameter to core HeadAttention constructor
- Kept LLaMA-specific CachedDecoder implementation (uses SwiGLU and RMSNorm)
- Core CachedDecoder uses different components (FeedForward and LayerNorm)
- Improved code reuse for attention components while maintaining LLaMA-specific decoder

This is a partial refactor - attention components are now shared, but decoder remains LLaMA-specific due to different normalization and activation requirements.
2025-10-06 14:26:32 +03:00
..

The file is empty.