mirror of
https://github.com/pese-git/llm-arch-research.git
synced 2026-01-24 05:21:16 +00:00
- Covers inference with and without cache and with sampling (top-k, top-p) - Includes test for max sequence length (should raise ValueError) - Verifies output shape and absence of dtype errors for the mask logic - Minimal config and random data ensure tests are fast and robust Motivation: Regression and integration protection for Llama decoding and sampling logic.