local-bench

Open weights. Local hardware.

Best variant per model

Quality vs the VRAM to run it

Each point is a model at its best-scoring quant. Up = smarter; left = fits a smaller card. The dotted line is the point-estimate efficiency frontier — no measured model is both higher-scoring and smaller on current point estimates. Hover any point for details.

Only 0 models measured so far — the efficiency frontier is preliminary and firms up as more variants land.

0 models
10075502502GB4GB6GB8GB12GB16GB24GB32GB48GB64GB96GB128GB192GB512GBNo ranked five-axis local rows yet; partial diagnostics stay off this frontier.effective VRAM to run (GB, log scale)Local Intelligence Index
vertical guides = common VRAM tiers; horizontal guides = API model ceilings

Leaderboard summary

No ranked variants yet

Partial benchmark profiles are available on model pages, but the Local Intelligence Index ranks only rows with Agentic, Knowledge, Instruction, Tool calling, and Coding all measured under the standard capped-thinking lane.

Benchmark a model · preview

Get the recipe to benchmark a model

Local Intelligence Index · v2.1 | 50/15/15/10/10

Pick your VRAM and a model to get the exact benchmark command. The board ranks Qwen3 and Gemma families today; the v1 suite the recipe needs, and one-step submission, ship with v2.

Most-downloaded board-rankable models that fit · catalog snapshot, not an endorsement.

localbench does not download or run the model. First start a local server, then localbench sends the benchmark to that endpoint.

Board-comparable · capped-thinking · qwen3 · suite/v1

Step 1 · start the model (leave running)
llama-server -hf MaziyarPanahi/Qwen3-0.6B-GGUF:Q8_0 --port 8080
Step 2 · benchmark it (second terminal)
localbench run --endpoint http://localhost:8080/v1 --model MaziyarPanahi/Qwen3-0.6B-GGUF:Q8_0 --hf-model-id Qwen/Qwen3-0.6B --suite-dir suite/v1 --lane capped-thinking --reasoning-activation qwen3 --tier standard --out my-run.json

Do not change sampling, context, or prompt-template settings unless the recipe says so. VRAM tiers are recommendations, not guaranteed fits · close other GPU workloads.

Preview of the contribution flow · the recipe is the real command, pinned to the v1 board. Obtaining the suite and one-step submission land with v2 · for now it produces a local my-run.json.Just exploring? See the board →
View full leaderboard →