Utilities for converting raw interaction data to pyKT format and validating dataset integrity. Use when preparing new datasets for Knowledge Tracing experiments.
Prepare and validate datasets for Knowledge Tracing experiments with pyKT framework.
python scripts/preprocess.py --status --data-dir ../data
# Single dataset
python scripts/preprocess.py --dataset assist2009 --pykt-path ../pykt-toolkit
# Multiple datasets
python scripts/preprocess.py --dataset assist2009 assist2015 ednet
# All available datasets
python scripts/preprocess.py --all
# Validate entire dataset directory
python scripts/validate_dataset.py --dir ../data/assist2009
# Validate specific file
python scripts/validate_dataset.py --file ../data/assist2009/data.txt --type raw
python scripts/validate_dataset.py --file ../data/assist2009/train_valid_sequences.csv --type csv
preprocess.pyWrapper for pyKT's preprocessing pipeline with batch support.
| Option | Description |
|---|---|
--dataset NAME | Dataset(s) to preprocess |
--all | Process all available datasets |
--status | Show preprocessing status only |
--data-dir PATH | Data directory location |
--pykt-path PATH | pykt-toolkit installation path |
--min-seq-len N | Minimum sequence length (default: 3) |
--maxlen N | Maximum sequence length (default: 200) |
--kfold N | Number of CV folds (default: 5) |
--list | List supported datasets with download URLs |
validate_dataset.pyValidate data format and report statistics.
| Option | Description |
|---|---|
--file PATH | Single file to validate |
--dir PATH | Dataset directory to validate |
--type raw/csv | File type (auto-detected) |
--json | Output results as JSON |
| Dataset | Type | Description |
|---|---|---|
| assist2009 | Q+C | ASSISTments 2009-2010 Math |
| assist2012 | Q+C | ASSISTments 2012-2013 |
| assist2015 | C | ASSISTments Skill Builder |
| assist2017 | Q+C | ASSISTments Competition |
| algebra2005 | Q+C | KDD Cup Algebra |
| bridge2algebra2006 | Q+C | KDD Cup Bridge to Algebra |
| statics2011 | C | Andes Physics |
| nips_task34 | Q+C | Eedi Education Challenge |
| ednet | Q+C | TOEIC English (Riiid) |
| junyi2015 | Q+C | Junyi Academy K-12 Math |
| slepemapy | Q+C | Geography |
| poj | C | Programming Judge |
Type: Q+C = Questions + Concepts, C = Concepts only
--list for URLs)pykt-toolkit/data/{dataset_name}/preprocess.pyvalidate_dataset.py