Model & Backend Selector — Phase 1(c–e)

Select optimal models and backend using leaderboard rankings filtered by hardware constraints.

Prerequisites

Read results/phase1/memory_budget.json and results/phase1/hardware_profile.json. Also read config/decisions.json for pre-made preferences.

Check current rankings. Look up the latest open-weight model rankings from:
- LMArena / Chatbot Arena (human preference, Bradley-Terry scores)
- Artificial Analysis (composite Intelligence Index) Use web search or known leaderboard URLs to get current top models.
Filter to feasible models. From the memory budget, determine which models fit. Consider:
- Dense models: full parameter count must fit at the selected quantization.
- MoE models: full parameter count must fit (all experts are stored in memory), but active parameters determine inference speed. (< 500 GB/s) because only active params hit the bandwidth bottleneck.

Select optimal models and backend using leaderboard rankings filtered by hardware constraints.

Read results/phase1/memory_budget.json and results/phase1/hardware_profile.json. Also read config/decisions.json for pre-made preferences.

Check current rankings. Look up the latest open-weight model rankings from:
- LMArena / Chatbot Arena (human preference, Bradley-Terry scores)
- Artificial Analysis (composite Intelligence Index) Use web search or known leaderboard URLs to get current top models.
Filter to feasible models. From the memory budget, determine which models fit. Consider:
- Dense models: full parameter count must fit at the selected quantization.
- MoE models: full parameter count must fit (all experts are stored in memory), but active parameters determine inference speed. (< 500 GB/s) because only active params hit the bandwidth bottleneck.