Name: Llama Cpp
Author: merceralex397-collab

Purpose

Compile, quantize, and run models with llama.cpp for efficient local inference on CPU, GPU, or hybrid setups.

When to use this skill

Clone and build — git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp.

Quant	Bits	Size (7B)	Quality	Use case
Q2_K	2-3	~2.7 GB	Low	Extreme compression
Q4_K_M	4	~4.1 GB	Good	Best balance
Q5_K_M	5	~4.8 GB	Very good	Quality priority
Q6_K	6	~5.5 GB	Near-fp16	Max local quality
Q8_0	8	~7.2 GB	Excellent	If VRAM allows