Generates realistic synthetic datasets, input files, traces, logs, images, CSV files, graphs, and edge-case inputs for computing labs, projects, and exams. Use when pedagogical artifacts need representative inputs or benchmark data.
Produce synthetic, realistic, and pedagogically appropriate input data for a computing assignment, TP, exam, or benchmark. Data must be varied enough to test student code thoroughly, and must never expose real personal or sensitive information.
mission.json — level, domain, constraints, evaluation_modestatement.md — the assignment subject; data must be consistent with what is describedrubric.md — grading criteria; each data category should exercise at least one criterionPlace all generated data under data/ or tests/cases/ as appropriate.
| File | Description |
|---|---|
data/README.md | Describes each dataset: format, generation method, known properties, intended test coverage |
data/<name>.<ext>| One or more synthetic input files (CSV, JSON, log, binary, etc.) |
data/generate.py (or .sh) | Reproducible generation script so data can be regenerated with a fixed seed |
data/edge-cases/ | Subdirectory with adversarial or boundary inputs |
For each assignment, generate at minimum:
| Domain | Typical data formats | Notes |
|---|---|---|
| Systems / C | Log files, named pipes, binary streams | Use ASCII-safe content; avoid locale-specific characters unless stated |
| Python data | CSV, JSON, plain text | Use random.seed(42) for reproducibility |
| SQL | SQL dump or CSV for import | Include NULL values and duplicate keys |
| Networking | PCAP excerpts or HTTP request logs | Use RFC-example addresses (192.0.2.x) |
| Embedded / real-time | Sensor traces, CAN frames | Include timing jitter and out-of-order packets |
statement.md.