Connects to an actively running GCP training VM, downloads the latest checkpoint snapshot to the local environment, and runs the HellaSwag evaluation suite on it.
This skill allows you to seamlessly evaluate the TinyLLM on your local machine while it is still actively training in the cloud, without pausing the pipeline.
Run the Retrieval & Evaluation:
scripts/run_remote_eval.sh handles the end-to-end process.bash .agent/skills/remote_evaluation/scripts/run_remote_eval.shWhat it Does:
a100-trainer Compute Engine instance over SSH.*.safetensors files inside the running /opt/tiny-llm/checkpoints_burn/ directory and targets the newest timestamp.gcloud compute scp to explicitly download that single large snapshot down to the local project root.Evaluator skill via to compile native bindings, memory-map the new weights onto the local GPU, and benchmark it against the 10,042 HellaSwag questions..agent/skills/evaluation/scripts/run_eval.shGenerator skill via .agent/skills/generation/scripts/test_generation.sh to output an AI text completion sample, verifying qualitative reasoning gains.