End-to-end model deployment for Jemma: Ollama local, HuggingFace Hub, Google Cloud Run, GGUF export. Use when: deploying, publishing, exporting, packaging, or releasing a trained model.
# Import GGUF into Ollama
python -u toolbox/import_gguf_to_ollama.py <path_to_gguf>
# Verify model registered
ollama list
# Smoke test
curl http://127.0.0.1:11434/api/chat -d '{
"model": "gemma4-e4b-it:q8_0",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Validate token
python -u -W ignore demos/validate_hf_token.py
# Publish (includes model card, NOTICE, demos)
python -u -W ignore toolbox/publish_to_hf.py --demos
Checklist:
toolbox/hf_model_card.md) has current benchmark scoresbase_model metadata points to google/gemma-4-E4B-itbase_model_relation is finetunegemma-4-good-hackathonsoumitty/jemma-safebrain-gemma-4-e4b-it# Generate Docker bundle from GGUF
python -u toolbox/prepare_ollama_cloud_bundle.py
# Review generated artifacts (Dockerfile, Modelfile, deploy script)
# Follow docs/google-cloud-ollama-deployment.md for cloud deploy
From an Unsloth checkpoint:
FastLanguageModelllama.cpp or import to Ollama| Target | Verify Command | Success Criteria |
|---|---|---|
| Ollama | ollama list | Model name appears in list |
| Ollama | curl .../api/chat | Valid JSON response |
| HuggingFace | python toolbox/publish_to_hf.py | Exit code 0, URL printed |
| Cloud Run | gcloud run services describe | Service status: READY |
| GGUF | llama.cpp/main -m <file> | Model loads, generates text |