Recover central-document similarity outputs from precomputed embeddings, including mislabeled or corrupted-looking embedding files, and produce validated JSON results. Use when a task asks for the most central document by average cosine similarity.
Use this skill when:
central_doc_id and average_similarity..npz extension (e.g., mislabeled text/script content).Inspect the input file before assuming format.
ls -lah <path>python -c "print(open('<path>','rb').read(64))"import numpy as np), treat it as mislabeled content, not a real NPZ.If file is a generator script, execute it to materialize the real archive.
python3 /workspace/embeddings/document_embeddings.npzLoad arrays with pickle enabled when doc_ids may be object dtype.
np.load(path, allow_pickle=True)ValueError: Object arrays cannot be loaded when allow_pickle=False.Compute cosine similarities efficiently with matrix multiplication.
X = E / ||E||.S = X @ X.T.avg = (S.sum(axis=1) - np.diag(S)) / (n - 1).argmax(avg).Write exact output schema.
central_doc_id (string)average_similarity (float rounded to 4 decimals)Trusting file extension alone (.npz)
np.load failed with unpickling errors because the .npz file actually contained Python source text.Using allow_pickle=False on object arrays
doc_ids failed when pickle was disabled.doc_ids stored as dtype=object, use allow_pickle=True for trusted local task artifacts.Overcomplicating multi-format loaders too early
Marking complete before re-checking output file
/workspace/results/central_document.json.cat (or parse) final JSON before completion.Run checks aligned with observed test assertions:
File existence
test -f /workspace/results/central_document.jsonValid JSON and required fields only
{"central_doc_id", "average_similarity"}Type/range checks
central_doc_id is string.average_similarity is numeric and in [0, 1] (as expected for this embedding set).Membership check
central_doc_id is present in input doc_ids.Correctness check (most important)
argmax(avg).E -> normalize -> S = X @ X.T -> avg excluding diagonal -> argmaxpython - <<'PY'\nprint(open(path,'rb').read(64))\nPY