Name: Ml Competition Pre Submit
Author: KameniAlexNea

Ml Competition Pre Submit

Pre-submission quality gate for ML competition pipelines. Runs three checks before any result is reported, code review (data leakage, CV contamination, metric errors), submission CSV format validation, and adversarial train/test distribution shift detection. Invoke before reporting any OOF score or submitting predictions.

KameniAlexNea0 스타2026. 3. 29.

직업
카테고리: 머신러닝

Overview

This skill is a mandatory pre-submission quality gate for tabular ML competition pipelines. It catches the three most costly bugs before they reach the leaderboard: target leakage inflating OOF scores, submission CSV format rejections, and OOF–LB correlation collapse caused by distribution shift.

Three checks, always in this order — skip none:

Code review — data leakage, CV contamination, metric implementation errors (references/checklist.md)
Submission format validation — column names, row count, NaN/Inf, value ranges (Workflow 2)
Adversarial distribution shift — train vs test AUC to detect shift and identify offending features (scripts/adversarial_validation.py)

Critical principle: no OOF score is meaningful until all CRITICAL items in the checklist pass. A leaky 0.95 AUC that passes no leakage checks is worthless — it will not exceed 0.70 on the leaderboard.

When to use: Before reporting any OOF score or submitting predictions — every single iteration, without exception.

When to Use

Overview

Three checks, always in this order — skip none:

Code review — data leakage, CV contamination, metric implementation errors (references/checklist.md)
Submission format validation — column names, row count, NaN/Inf, value ranges (Workflow 2)
Adversarial distribution shift — train vs test AUC to detect shift and identify offending features (scripts/adversarial_validation.py)

When to use: Before reporting any OOF score or submitting predictions — every single iteration, without exception.

AUC	Verdict	Action
0.50–0.55	✅ No shift	Proceed normally
0.55–0.65	⚠️ Mild shift	Check top features; monitor LB-OOF gap
0.65–0.80	❌ Moderate shift	Drop or transform top leaking features
0.80–1.00	🚨 Severe shift	Likely ID/time leak — investigate immediately

File	What it covers
checklist.md	CRITICAL and Important quality gates — data leakage, metric correctness, submission format, robustness
adversarial_validation.py	Distribution shift detector: AUC, top leaking features, sample weights for retraining

Skill	Why
`ml-competition-features`	CV split strategy — GroupKFold / TimeSeriesSplit, OOF array accumulation
`ml-competition-advanced`	OOF vs LB divergence diagnosis — run after this checklist passes
`ml-competition-training`	Metric → prediction type table — governs what Workflow 2 checks
`ml-competition-quality`	16 production bugs — most are variants of the leakage and metric bugs caught here

Ml Competition Pre Submit

Overview

When to Use

Ml Competition Pre Submit

Overview

When to Use

Critical Rules

✅ DO

❌ DON'T

Anti-Patterns (NEVER)

Workflows

Workflow 1 — Code Review Checklist

Data Leakage

Metric Correctness

Robustness

Workflow 2 — Submission File Validation

Workflow 3 — Adversarial Validation

AUC interpretation

Using adversarial sample weights

Common Pitfalls and Solutions

The "Leaky OOF" Problem

The "False Negative" Adversarial AUC

The "Silent Submission Failure"

Reference Files

See Also

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns