Clean up messy spreadsheet data — trim whitespace, fix inconsistent casing, convert numbers-stored-as-text, standardize dates, remove duplicates, and flag mixed-type columns. Use when data is messy, inconsistent, or needs prep before analysis. Triggers on "clean this data", "clean up this sheet", "normalize this data", "fix formatting", "dedupe", "standardize this column", and "this data is messy".
Clean messy data in the active sheet or a specified range.
Before starting, verify required libraries are installed and install any that are missing.
python3 -c "import openpyxl" 2>/dev/null || python3 -m pip install openpyxl
Important: Do not skip this step — the workflow below will fail without these libraries.
range.values, write helper-column formulas via range.formulas = [["=TRIM(A2)"]]. The in-place vs helper-column decision still applies..xlsx file: Use Python and openpyxl.A1:F200, use it.| Issue | What to look for |
|---|---|
| Whitespace | Leading/trailing spaces, double spaces |
| Casing | Inconsistent casing in categorical columns like usa, USA, Usa |
| Number-as-text | Numeric values stored as text; stray $, ,, % in number cells |
| Dates | Mixed formats in the same column like 3/8/26, 2026-03-08, March 8 2026 |
| Duplicates | Exact-duplicate rows and near-duplicates caused by case or whitespace differences |
| Blanks | Empty cells in otherwise-populated columns |
| Mixed types | A column that is mostly numbers but has a few text entries |
| Encoding | Mojibake, non-printing characters |
| Errors | #REF!, #N/A, #VALUE!, #DIV/0! |
Show a summary table before changing anything:
| Column | Issue | Count | Proposed Fix |
|---|
=TRIM(A2), =VALUE(SUBSTITUTE(B2,"$","")), =UPPER(C2), or =DATEVALUE(D2), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original.