Dynamic AI Workstation Planner v6.21

Inference planning for unified-memory boxes and multi-GPU stacks · APEX + TurboQuant optimization toggles

System

Model calculator

System VRAM
Used

Optimization toggles

APEX-Quant
APEX I-Quality mixed-precision quantization (mudler/apex-quant). Optimizes quality for MoE models — assigns higher precision to critical layers. May use more or less VRAM than standard quants. No effect on dense models.
TurboQuant
KV cache compression (TheTom/llama-cpp-turboquant). Compresses context memory 3-5x. Requires -fa on.

Context window

262K

All models

Table
VRAM breakdown
Speed vs context
ModelTypeParamsGGUFFits?t/sContext
v6.21 · Data: inline + optional models.js · Updated 2026-04-19
Built by Kahalewai · Claude did all the heavy lifting