Name: Dataset And Input Factory
Author: alainlebret

Goal

Produce synthetic, realistic, and pedagogically appropriate input data for a computing assignment, TP, exam, or benchmark. Data must be varied enough to test student code thoroughly, and must never expose real personal or sensitive information.

Inputs

mission.json — level, domain, constraints, evaluation_mode
statement.md — the assignment subject; data must be consistent with what is described
rubric.md — grading criteria; each data category should exercise at least one criterion

Output contract

Place all generated data under data/ or tests/cases/ as appropriate.

File	Description
`data/README.md`	Describes each dataset: format, generation method, known properties, intended test coverage

Goal

Inputs

mission.json — level, domain, constraints, evaluation_mode
statement.md — the assignment subject; data must be consistent with what is described
rubric.md — grading criteria; each data category should exercise at least one criterion

Output contract

Place all generated data under data/ or tests/cases/ as appropriate.

File	Description
`data/README.md`	Describes each dataset: format, generation method, known properties, intended test coverage

Domain	Typical data formats	Notes
Systems / C	Log files, named pipes, binary streams	Use ASCII-safe content; avoid locale-specific characters unless stated
Python data	CSV, JSON, plain text	Use `random.seed(42)` for reproducibility
SQL	SQL dump or CSV for import	Include NULL values and duplicate keys
Networking	PCAP excerpts or HTTP request logs	Use RFC-example addresses (192.0.2.x)
Embedded / real-time	Sensor traces, CAN frames	Include timing jitter and out-of-order packets

Dataset And Input Factory

Goal

Inputs

Output contract

Dataset And Input Factory

Goal

Inputs

Output contract

Dataset categories to produce

Domain-specific guidance

Rules

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns