Name: Remote Evaluator
Author: ltouati

Remote Checkpoint Evaluation

This skill allows you to seamlessly evaluate the TinyLLM on your local machine while it is still actively training in the cloud, without pausing the pipeline.

How to use this skill

Run the Retrieval & Evaluation:
- The helper script scripts/run_remote_eval.sh handles the end-to-end process.
- Example: bash .agent/skills/remote_evaluation/scripts/run_remote_eval.sh
What it Does:
- The script connects to the a100-trainer Compute Engine instance over SSH.
- It recursively searches all *.safetensors files inside the running /opt/tiny-llm/checkpoints_burn/ directory and targets the newest timestamp.
- It uses gcloud compute scp to explicitly download that single large snapshot down to the local project root.
- It then hands off the checkpoint to the Evaluator skill via to compile native bindings, memory-map the new weights onto the local GPU, and benchmark it against the 10,042 HellaSwag questions.

Remote Checkpoint Evaluation

This skill allows you to seamlessly evaluate the TinyLLM on your local machine while it is still actively training in the cloud, without pausing the pipeline.

How to use this skill

Run the Retrieval & Evaluation:
- The helper script scripts/run_remote_eval.sh handles the end-to-end process.
- Example: bash .agent/skills/remote_evaluation/scripts/run_remote_eval.sh
What it Does:
- The script connects to the a100-trainer Compute Engine instance over SSH.
- It recursively searches all *.safetensors files inside the running /opt/tiny-llm/checkpoints_burn/ directory and targets the newest timestamp.
- It uses gcloud compute scp to explicitly download that single large snapshot down to the local project root.
- It then hands off the checkpoint to the Evaluator skill via to compile native bindings, memory-map the new weights onto the local GPU, and benchmark it against the 10,042 HellaSwag questions.

Remote Evaluator

Remote Checkpoint Evaluation

How to use this skill

Remote Evaluator

Remote Checkpoint Evaluation

How to use this skill

Feishu Drive

Nanoclaw Repl

Crosspost

Cloudflare

Mcp Integration

Setup Deploy