Commit Graph

4 Commits

Author SHA1 Message Date
Sergey Penkovsky
f060497eb1 docs: add analysis notebooks for BPE and GPT
- Add bpe.ipynb with Byte Pair Encoding implementation analysis
- Update gpt_analysis.ipynb with GPT model experiments and visualizations
2025-10-05 08:23:09 +03:00
Sergey Penkovsky
fb74dc7c17 test: add comprehensive test suite for LLM components
- Add pytest configuration and fixtures
- Add tests for core modules: decoder, feed_forward, multi_head_attention
- Add tests for positional and token embeddings
- Add tests for GPT model
- Add tests for tokenizers (base and BPE)
- Add basic integration tests
2025-10-05 08:11:18 +03:00
Sergey Penkovsky
f4bdc81829 fix: update PyTorch mask types and BPE tokenizer serialization
- Replace deprecated torch.uint8 and .byte() with torch.bool in GPT.generate
- Add save/load methods to BPETokenizer for proper merges and vocab_list serialization
- Update dependencies in pyproject.toml
2025-10-05 08:09:30 +03:00
Sergey Penkovsky
ec07546ea8 feat: initial project setup with LLM architecture and HF integration
- Add LLM library with GPT model implementation
- Add hf-proxy for HuggingFace integration
- Add experiments for training and generation
- Add comprehensive documentation and examples
- Configure uv workspace with proper dependencies
2025-10-04 22:40:21 +03:00