Create or reuse Hugging Face dataset PRs for `harborframework/parity-experiments` and upload Harbor parity/oracle result folders efficiently with sparse checkout, raw git pushes, and Git LFS.
Use this skill to publish Harbor parity experiment outputs to the shared Hugging Face dataset and capture the resulting discussion URL for the adapter's parity_pr field.
hf upload-large-folder can be slow or unreliable for large parity bundles because it pushes through the Hub API commit loop.harborframework/parity-experiments is too expensive because the dataset is very large.This skill avoids the full clone by fetching only the target PR ref with --depth 1 --filter=blob:none and checking out only the paths needed for the current adapter.
write token or a fine-grained token with global discussion.write enabled at https://huggingface.co/settings/tokens. A read-only or narrowly-scoped token will cause to fail with HTTP 403.create_pr.pyharborframework/parity-experiments unless the user explicitly asks for another repo.git push.parity_pr.For large parity bundles, prefer raw git over hf upload-large-folder. The raw git path is materially faster and more reliable because it avoids the API-side commit loop and does not require cloning the entire parity dataset.
If the user already has a parity PR number, reuse it.
Otherwise, create one with the bundled helper:
uv run python scripts/create_pr.py create-pr \
--title "Add parity experiments for <adapter_name>" \
--description-file /path/to/pr-description.md
The script prints JSON including:
pr_numberdiscussion_urlrepo_idmkdir -p /tmp/parity-experiments-pr<number>
cd /tmp/parity-experiments-pr<number>
git init
git remote add origin [email protected]:datasets/harborframework/parity-experiments
git config core.sparseCheckout true
git sparse-checkout init --cone
git sparse-checkout set adapters/<adapter_name>
git fetch --depth 1 --filter=blob:none origin refs/pr/<number>:pr/<number>
git checkout pr/<number>
This fetches only the PR ref and the requested paths instead of cloning the full dataset repo.
If the local folder already contains the final repo-root layout, copy it as-is.
If the local folder only contains the adapter subtree, copy it into adapters/<adapter_name>/:
rsync -a --delete \
--exclude '.git' \
--exclude '.cache' \
--exclude '.DS_Store' \
/path/to/local-folder/ \
adapters/<adapter_name>/
The repo-root .gitattributes already LFS-tracks common binary, model, archive, and media extensions (*.bin, *.parquet, *.safetensors, images, audio, video, *.log, *.txt, etc.). Most parity outputs are covered automatically and need no manual action. Scan the sparse checkout for files larger than 10 MiB to catch anything that slipped through:
python - <<'PY'
from pathlib import Path
for path in sorted(Path(".").rglob("*")):
if path.is_file() and ".git" not in path.parts and path.stat().st_size > 10 * 1024 * 1024:
print(path)
PY
If any file is flagged and is not already covered by root, write the LFS rule to adapters/<adapter_name>/.gitattributes — never to the repo-root .gitattributes, which is a shared merge-conflict hotspot. Run git lfs track inside the adapter directory so the rule is written with a relative pattern:
(cd adapters/<adapter_name> && git lfs track "<pattern>")
git add adapters/<adapter_name>/.gitattributes
Git LFS honors nested .gitattributes files, so rules added this way apply only to that adapter and never collide with other in-flight parity PRs.
find . -name .DS_Store -delete
git add adapters/<adapter_name>
git commit -m "Add parity experiment artifacts for <adapter_name>"
GIT_SSH_COMMAND='ssh -o ServerAliveInterval=30 -o ServerAliveCountMax=10' \
git push origin pr/<number>:refs/pr/<number>
Use this discussion URL in parity_experiment.json: