- Covers forward pass with and without KV-cache
- Verifies correct sequence generation for greedy, top-k, and top-p sampling
- Adds ValueError test for exceeding max sequence length
- Uses small random toy config and minimal setup for fast test feedback
Motivation: Prevent regressions in decoding, sampling, and KV-cache logic in GPT2 implementation.