KVInfer · 152M

Chat

Idle

KVInfer Studio

152M · GPT-2 Decoder-Only · Custom C++ inference engine with AVX2 SIMD, OpenMP parallelism & persistent session KV-cache.

152M params AVX2 SIMD OpenMP KV Cache Streaming
Enter to send  ·  Shift+Enter for newline

📊 Quick Benchmark

Runs 5 built-in prompts and measures throughput, TTFT, and per-token latency.