#0843
`Qwen3.6 27B` pure `Q4_K_M` GGUF fits in **16GB VRAM**
40radar
Qwen3.6Open LLM — active GGUF ecosystem for local inference
Pure quantization trims enough size to keep the whole model on a consumer GPU. Useful for local agent tests, but quality loss is real and benchmark depth is thin.
Q4_K_M MTPis 15.4GB and non-MTP is 15.1GB; comparable builds listed at 16.5-18GB often spill past 16GB cards.- MTP reaches 40 tok/s generation but only 195 tok/s prompt processing; non-MTP flips the trade-off at 715 tok/s pp and 24 tok/s tg.
- Perplexity delta is larger than Unsloth's quant: +0.1707 vs +0.0553 on MTP, so the size win buys speed/fit at some quality cost.
Source: www.reddit.com/r/LocalLLaMA/comments/1tkzk9e/qwen36_27b_Read original →