Dynamic AI Workstation Planner v6.21

Inference planning for unified-memory boxes and multi-GPU stacks · APEX + TurboQuant optimization toggles

System

Hardware

Model calculator

Model

System VRAM

Used

Optimization toggles

APEX-Quant

APEX I-Quality mixed-precision quantization (mudler/apex-quant). Optimizes quality for MoE models — assigns higher precision to critical layers. May use more or less VRAM than standard quants. No effect on dense models.

TurboQuant

KV cache compression (TheTom/llama-cpp-turboquant). Compresses context memory 3-5x. Requires -fa on.

Context window

Context 262K

All models

Table

VRAM breakdown

Speed vs context

Model	Type	Params	GGUF	Fits?	t/s	Context

v6.21 · Data: inline + optional models.js · Updated 2026-04-19
Built by Kahalewai · Claude did all the heavy lifting