Name: Audit
Author: vitalwarley

Buscar habilidades.../

Audit | Skills Pool

cd src/kinship-contrastive
PYTHONPATH=. uv run python scripts/compute_<metric>.py

grep -r "<metric_name>" report/

# <Metric Name> Audit — Pre-Conditions, Process, Post-Conditions

**Date**: YYYY-MM-DD
**Scope**: `<function_name>()` in `analysis/metrics.py` [or named script]
**Reference**: `report/methodological-preconditions-audit.md` (global),
               `report/sprints/week-15/twonn-audit.md` (per-metric template)
**Verified values**: `results/embedding_analysis/<metric>.json` (recomputed YYYY-MM-DD)

---

## Context

[1-2 paragraphs: why this metric exists in the pipeline, what question it answers,
why the audit was triggered now.]

---

## Pre-Conditions

| # | Pre-condition | Status | Evidence |
|---|--------------|--------|----------|
| P1 | Embeddings from best checkpoints | ✅/⚠️/❌ | [source] |
| P2 | [Data assumption] | ✅/⚠️/❌ | [source] |
| P3 | [Numerical assumption] | ✅/⚠️/❌ | [source] |
| ... | | | |

Status key: ✅ Verified | ⚠️ Assumed/partial | ❌ Known gap

---

## Process

| # | Step | Status | Note |
|---|------|--------|------|
| E1 | [Algorithm step] | ✅/⚠️/❌ | [correctness note] |
| E2 | [Edge case handling] | ✅/⚠️/❌ | [note] |
| ... | | | |

---

## Post-Conditions (Verified Results — YYYY-MM-DD)

[Results table: all 5 conditions × relevant output values]

| # | Post-condition | Status | Evidence |
|---|---------------|--------|----------|
| Q1 | [Expected pattern across conditions] | ✅/⚠️/❌ | [value or reference] |
| Q2 | [Consistency check] | ✅/⚠️/❌ | [note] |
| ... | | | |

---

## Additional Diagnostics

[Only present if you ran beyond-template analyses (null models, sensitivity,
confound checks). Open with a scope note: which design principle triggered
each diagnostic; whether it overlaps with deferred diagnostics from prior audits.
If nothing beyond-template was run, omit this section entirely.]

---

## Known Limitations

| # | Limitation | Severity | Action |
|---|-----------|----------|--------|
| L1 | [assumption not tested] | Low/Med/High | [defer/fix/accept] |
| ... | | | |

---

## Safe Claim for SIBGRAPI

> "[One-sentence claim the paper can make, with values.]"

**Do not claim**: [what cannot be asserted and why]

---

## Reproduction

```bash
cd src/kinship-contrastive
PYTHONPATH=. uv run python scripts/compute_<metric>.py
# Output: results/embedding_analysis/<metric>.json


---

## Step 7: Commit and Link

```bash
git add report/sprints/week-NN/<metric>-audit.md
git commit -m "docs: add <metric> audit (WNN)"

Check for sample-count confounds in group metrics: if a metric has a theoretical bound that scales with group size (e.g., rank ≤ N−1), verify that the observed values aren't just tracking group size. A scatter plot of metric vs group size settles this; if correlated, prefer a normalized variant.
Interpret metric saturation structurally: when a metric is near its theoretical maximum, state what that implies about the data, not just that the value is high.
One pipeline, multiple metrics — validate each separately: a single script may produce several quantities that answer different questions and have different validation histories. Audit them independently; don't let a validated primary metric confer trust on co-computed ones.
Verify label purity for class-level analyses: when class labels are derived from a secondary source (pair file, metadata, external annotation), check that every sample's label is valid for the specific analysis task. Mixed or approximate labels inflate within-class scatter and bias class-separation metrics downward.
Define concepts at first use in Post-Conditions: if you introduce a term (e.g., "random baseline", "SNR", "normalized variant") in a Post-Conditions table or result, add a brief prose paragraph defining it immediately before that table. Do not defer the definition to Critical Findings — by then the reader has already encountered the unexplained term.
Failing post-condition checks need narrative, not just evidence strings: a ❌ row in the Q-table must include: (a) what the check was testing and why it matters, (b) why the threshold was chosen, (c) what the consequence is for the paper claim. ✅ rows can stay brief. The more unexpected the failure, the more explanation it requires.
Additional Diagnostics section for beyond-template analyses: when you run analyses that go beyond the fixed Pre/Process/Post template (e.g., null models, sensitivity to hyperparameters, confound checks), place them in a dedicated ## Additional Diagnostics section after Post-Conditions. Open the section with a scope note: which design principle triggered each diagnostic, and whether any overlaps with deferred diagnostic gaps listed in prior audits (note they are separate if so).
Preserve correct content when correcting a prior document: when revising an earlier audit or report because one finding was wrong, identify specifically what is preserved vs. corrected. Do not replace whole sections if only one claim is wrong. A brief note like "Definition preserved; causal interpretation corrected" in the updated section header prevents accidental removal of still-valid content.

Audit

/audit — Metric & Pipeline Audit

Arguments

Audit

/audit — Metric & Pipeline Audit

Arguments

Step 1: Locate the Implementation

Step 2: Check Existing Audit Coverage

Step 3: Verify Embedding Integrity

Step 4: Recompute the Metric

Step 5: Cross-Reference Reports

Step 6: Build the Audit Document

Structure:

Step 8: Update methodological-preconditions-audit.md (if applicable)

Design Principles

Additional Design Principles

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio