Skill ファイル

Synthdata Compute

Name: Synthdata Compute
Author: rappdw

Compute derived, aggregated, or transformed tables from existing datasets. Use this skill when the user needs to "compute monthly scores", "aggregate by month", "create a summary table", "derive risk scores", "compute percentile ranks", "roll up events", "create benchmarks from raw data", "add a computed column", or bridge the gap between raw generated tables and downstream analytics. Works on xlsx, csv, or json input. Claude writes the computation logic; the script handles data I/O.

rappdw0 スター2026/04/07

職業
カテゴリ: データ分析

スキル内容

Derive aggregated, scored, or transformed tables from existing datasets. This skill bridges the gap between raw synthetic data (produced by synthdata-generate) and the derived metrics that downstream tools expect — monthly rollups, composite scores, percentile ranks, segment summaries.

Prerequisites

pip install openpyxl pandas numpy --break-system-packages

Workflow

Step 1: Identify input and desired output

Ask the user:

Input: What file(s)? (xlsx workbook, csv directory, json files)
Computation: What do you need? (natural language — e.g., "monthly risk scores from event tables")
Output: Where should results go? (default: adds sheets/files alongside input)

Step 2: Inspect the data

関連 Skill

Synthdata Compute | Skills Pool

python3 scripts/compute.py --inspect --input data.xlsx

import pandas as pd
import numpy as np

users = tables['users']
threats = tables['threat_events']
sims = tables['phishing_sims']
training = tables['training']
dlp = tables['dlp_events']
abuse = tables['abuse_mailbox']

# Extract month from event dates
threats['month'] = pd.to_datetime(threats['event_date']).dt.to_period('M')
sims['month'] = pd.to_datetime(sims['sim_date']).dt.to_period('M') if 'sim_date' in sims else ...

# Group and aggregate per user per month
# ... (Claude writes the full logic based on the user's requirements)

result['monthly_risk'] = monthly_risk_df

python3 scripts/compute.py --input data.xlsx --code monthly_risk.py --output data_with_scores.xlsx

Flag	Description
`--input`	Source dataset (xlsx/csv-dir/json)
`--output`	Output path (default: `<input>_computed.<ext>`)
`--inspect`	Print table schemas and exit
`--code`	Path to a Python file with computation logic
`--expr`	Inline Python expression (for simple one-liners)
`--append`	Add computed tables to a copy of the input file (xlsx only)

python3 scripts/compute.py --input data.xlsx --output scored.xlsx \
    --expr "result['summary'] = tables['events'].groupby('category').size().reset_index(name='count')"

events = tables['threat_events']
events['month'] = pd.to_datetime(events['event_date']).dt.to_period('M').astype(str)
monthly = events.groupby(['user_id', 'month']).agg(
    threat_events=('event_id', 'count'),
    real_clicks=('clicked', 'sum'),
    credentials_entered=('credentials_entered', 'sum'),
).reset_index()
result['monthly_threats'] = monthly

# Weights must sum to 1.0
WEIGHTS = {'threat': 0.35, 'sim': 0.25, 'dlp': 0.20, 'training': 0.10, 'reporting': 0.10}

df['composite'] = (
    df['threat_score'] * WEIGHTS['threat']
    + df['sim_score'] * WEIGHTS['sim']
    + df['dlp_score'] * WEIGHTS['dlp']
    + df['training_score'] * WEIGHTS['training']
    + df['reporting_score'] * WEIGHTS['reporting']
)

def assign_tier(score):
    if score >= 75: return 'Critical'
    if score >= 50: return 'High'
    if score >= 25: return 'Medium'
    return 'Low'

df['risk_tier'] = df['composite_risk_score'].apply(assign_tier)

df['percentile'] = df['composite_risk_score'].rank(pct=True) * 100

peers = tables['peers_detail']
metrics = ['sim_click_rate', 'training_completion', 'composite_risk_score']
rows = []
for segment_col in ['segment_healthcare', 'segment_geo', 'segment_size']:
    cohort = peers[peers[segment_col] == 1]
    for m in metrics:
        rows.append({
            'segment': segment_col, 'metric': m,
            'mean': cohort[m].mean(), 'median': cohort[m].median(),
            'p25': cohort[m].quantile(0.25), 'p75': cohort[m].quantile(0.75),
        })
result['segment_summaries'] = pd.DataFrame(rows)

users = tables['users']
risk = tables['monthly_risk']
merged = risk.merge(users[['user_id', 'department']], on='user_id')
dept = merged.groupby('department').agg(
    avg_risk=('composite_risk_score', 'mean'),
    critical_count=('risk_tier', lambda x: (x == 'Critical').sum()),
    user_count=('user_id', 'nunique'),
).sort_values('avg_risk', ascending=False).reset_index()
result['department_risk'] = dept

Need	Skill
Generate new rows from a schema/template	`synthdata-generate`
Extract Excel → JSON	`synthdata-extract`
Add rows/columns to existing data	`synthdata-extend`
Replace PII with fake values	`synthdata-anonymize`
Aggregate, score, rank, or derive new tables from existing data	synthdata-compute

Synthdata Compute

Prerequisites

Workflow

Step 1: Identify input and desired output

Step 2: Inspect the data

Synthdata Compute

Prerequisites

Workflow

Step 1: Identify input and desired output

Step 2: Inspect the data

Step 3: Write the computation

Step 4: Execute

Step 5: Validate

Common Computation Recipes

Monthly event rollup

Composite score from weighted sub-scores

Risk tier assignment

Percentile rank

Segment summary (peer benchmarks)

Department breakdown

When to use this skill vs others

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling