Run Model Evaluation

Run the full evaluation pipeline for a model and update all artifacts.

The argument should be an OpenRouter model ID (e.g., minimax/minimax-m2.7). The user may also provide a short log directory name (e.g., minimax-m2.7); if not, derive one from the model ID by taking the last path segment.

Steps

1. Run the evaluation

uv run inspect eval-set specieval/speciesism specieval/sentience specieval/attitude_meat specieval/attitude_seafood --model openrouter/$ARGUMENTS --log-dir logs/<log-dir-name>

This will take a while. Wait for it to complete and verify it succeeded (check for errors in the output).

2. Add model to allowed_models.json

Add the model's short name (the value that appears in the model field of log results, typically the last segment of the OpenRouter model ID) to , keeping the list in alphabetical order.

Run Model Evaluation

Run the full evaluation pipeline for a model and update all artifacts.

Steps

1. Run the evaluation

uv run inspect eval-set specieval/speciesism specieval/sentience specieval/attitude_meat specieval/attitude_seafood --model openrouter/$ARGUMENTS --log-dir logs/<log-dir-name>

This will take a while. Wait for it to complete and verify it succeeded (check for errors in the output).

2. Add model to allowed_models.json

Add the model's short name (the value that appears in the model field of log results, typically the last segment of the OpenRouter model ID) to , keeping the list in alphabetical order.

Run Model

Run Model Evaluation

Steps

1. Run the evaluation

2. Add model to allowed_models.json

Run Model

Run Model Evaluation

Steps

1. Run the evaluation

2. Add model to allowed_models.json

3. Generate updated analysis

4. Update README.md

5. Suggest a commit

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns