skuld

Paste model name here:

quantization context length:

enable extended context (256k / 512k / 1M)

batch size — concurrent sequences; KV scales linearly. 1 for single-user inference.

LoRA fine-tuning (rank 16, adds adapter overhead)

Total Usage ≈ —

weights —

KV cache —