Use py-AutoClean for fast, parameterized pandas data cleaning, then return cleaned outputs and a concise operation log.
Use this skill when cleaning CSV/tabular data with py-AutoClean in Python.
Prefer AutoClean as the first cleaning pass, then apply targeted follow-up logic only if required.
This skill is intentionally limited to behavior documented in:
elisemercury/AutoClean)Do not invent unsupported parameters, modes, or return fields.
py-AutoClean is installed in the runtime.from AutoClean import AutoClean
from AutoClean import AutoClean
pipeline = AutoClean(dataset)
df_clean = pipeline.output
pipeline = AutoClean(dataset, mode='auto')
df_clean = pipeline.output
pipeline = AutoClean(
dataset,
mode='manual',
outliers='auto'
)
df_clean = pipeline.output
Use only these parameters and allowed values:
mode: 'auto' | 'manual'duplicates: 'auto' | True | Falsemissing_num: 'auto' | 'linreg' | 'knn' | 'mean' | 'median' | 'most_frequent' | 'delete' | Falsemissing_categ: 'auto' | 'logreg' | 'knn' | 'most_frequent' | 'delete' | Falseencode_categ: 'auto' | ['onehot'] | ['label'] | ['auto', [<col_name_or_index>, ...]] | ['onehot', [...]] | ['label', [...]] | Falseextract_datetime: 'auto' | 'D' | 'M' | 'Y' | 'h' | 'm' | 's' | Falseoutliers: 'auto' | 'winz' | 'delete' | Falseoutlier_param: int | float (default documented as 1.5)logfile: True | Falseverbose: True | Falsemode='auto' runs full automated pipeline behavior.mode='manual' allows selective steps via explicit parameter settings.[Q1 - 1.5*IQR, Q3 + 1.5*IQR] by default.encode_categ='auto' uses cardinality-based encoding logic documented by AutoClean.pipeline.output.mode='manual' and only the needed parameters.Keep this policy simple; do not add undocumented heuristics.
For multiple files, run the same AutoClean workflow per DataFrame independently and emit one cleaned output per source file.
Example skeleton:
cleaned = {}
for file_name, df in dataframes.items():
pipeline = AutoClean(df, mode='auto')
cleaned[file_name] = pipeline.output
When reproducibility is needed:
logfile=True to generate autoclean.log (per AutoClean docs).verbose=True to stream process logs to console.