Name: Run Cross Cluster Comparison
Author: j-mckerracher

搵技能.../

Run Cross Cluster Comparison | Skills Pool

ncores, nhosts, timelimit_sec, runtime_sec, queue_time_sec, runtime_fraction

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
clf = LogisticRegression(random_state=config["random_seed"])
clf.fit(X_scaled, y_domain)  # y_domain: 0=source, 1=target

propensity = clf.predict_proba(X_scaled)[:, 1]  # P(target | x)

lo, hi = config["overlap_band"]   # e.g., [0.2, 0.8]
in_overlap = (propensity >= lo) & (propensity <= hi)

- Regime definition: cpu_standard (proxy | authoritative)
- Feature set: [list of features]
- Overlap band: [0.2, 0.8]
- Domain classifier AUC: <value>
- Target overlap coverage: <value> (n=<count> of <total>)
- KS statistics: [per-feature table]

EXP	Source→Target	Feature set	Band	AUC	Coverage
EXP-022	Anvil→Conte	alloc-only	[0.2,0.8]	0.80	1.0
EXP-031	Anvil→Conte	alloc+perf (no mem)	[0.2,0.8]	0.99	0.32
EXP-024	Anvil→Conte	alloc+perf (no mem)	[0.1,0.9]	0.99	0.42

Label	Criteria
`cpu_standard`	`gpu_count_per_node == 0` AND `node_memory_gb < threshold`
`cpu_largemem`	`gpu_count_per_node == 0` AND `node_memory_gb >= threshold`
`gpu_standard`	`gpu_count_per_node > 0` AND `node_memory_gb < threshold`

Label	Criteria
`cpu_standard`	`gpu_count_per_node == 0` AND `node_memory_gb < threshold`
`cpu_largemem`	`gpu_count_per_node == 0` AND `node_memory_gb >= threshold`
`gpu_standard`	`gpu_count_per_node > 0` AND `node_memory_gb < threshold`

Metric	How to compute
Domain classifier AUC	`roc_auc_score(y_domain, propensity)`
Target overlap coverage	`n_target_in_overlap / n_target_total`
KS statistics	`scipy.stats.ks_2samp` per feature

Run Cross Cluster Comparison

Non-negotiable rule

Step 1: Define the workload regime

Run Cross Cluster Comparison

Non-negotiable rule

Step 1: Define the workload regime

Step 2: Choose a feature set

Step 3: Compute overlap

Step 4: Evaluate the overlap

Step 5: Restrict evaluation to overlap

Step 6: Required reporting in the run log and paper

Reference runs

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy