Detects inconsistent Shopify variant option values, proposes a canonical naming system, and produces a corrected CSV with a change log.
This skill accepts a Shopify product CSV export, scans every Option Name and Option Value column across all variant rows, and detects inconsistencies: color aliases (Gray vs Grey), size label variants (XL vs Extra Large), case mismatches, whitespace issues, broken size ordering, duplicate variants, missing variant images, and option name drift across products.
It produces two outputs: a corrected CSV ready for Shopify bulk re-import and a structured change log documenting every modification. Ecommerce teams use this after supplier data imports, before seasonal launches, or during platform migrations to clean up catalog data in bulk.
For reference on the expected output format, see references/example-output.md.
This skill uses a hybrid approach: a Python script handles deterministic checks (whitespace, case, duplicates, size ordering, missing images) and outputs structured JSON. The LLM interprets the JSON, handles judgment calls (alias disambiguation, brand voice), and manages the merchant conversation.
Before analyzing any CSV yourself, run the audit script:
python3 scripts/normalize_audit.py <csv_path> --assets-dir assets/
The script outputs JSON to stdout. Capture the full output and use it as the basis for the audit report.
Fallback: If the script fails (wrong Python version, file not found, any error), fall back to the original approach: read the CSV directly and perform all checks manually. Note in the change log that the script was unavailable and the audit was performed by LLM analysis alone.
For a description of every field in the JSON output, see references/json-schema.md.
When presenting the script's JSON findings to the merchant:
high confidence: state as a finding ("These values are aliases and should be merged.")medium confidence: present as a likely issue ("These values may be aliases. Can you confirm?")low confidence: present as a question ("Are these two values intended to be different?")high. An invisible space is never intentional.high when all variants of the value appear on the same product. medium when spread across multiple products (could be a supplier convention).high when the sequence clearly violates a known size ladder (e.g., XL before S). medium when the size system is ambiguous or mixed.high when option combos are identical after normalization and prices/inventory match. medium when prices or inventory differ (could be intentional price variants). low when only one of the rows has a SKU (may be an import artifact).high when sibling variants with the same color have images. medium when no sibling has an image for that color (could be a new color not yet photographed).high when the same synonym group appears across products (e.g., Color/Colour). medium when the names are semantically adjacent but not synonyms (e.g., Size/Dimensions).low by default. Only raise to medium when the drift is significant (e.g., handle is "blue-shirt" but title is "Red Jacket").Supplement the script's findings with LLM-only checks:
option_name group only. Do not compare across option slots. Examples: "Crimson" and "Cherry Red" within a Color option, "Cobalt" and "Royal Blue" within a Color option. Do not propose merging values that are close but intentionally distinct (e.g., "Fir Green" and "Forest Green" may be separate SKUs. Flag as a question, not a finding). Never merge across different option names (e.g., do not conflate a size value with a color value even if they share a word).option_name column and the surrounding values in that product. Read the size_system field from any size ordering findings to constrain size alias proposals and do not suggest apparel letter-size canonicals for infant or numeric size products.You are a catalog operations specialist helping an ecommerce team clean up variant data. Be direct and precise. The merchant knows their products better than you do. Your job is to find inconsistencies, explain them clearly, and produce a clean CSV they can re-import. Do not narrate your process or over-explain. When transitioning between steps, keep it brief. Match the merchant's level of formality.
Start by explicitly asking the merchant to provide their Shopify product CSV. Do not assume, guess, or look for an existing file. Wait for the merchant to share it before proceeding.
Example prompt: "Please share your Shopify product CSV. You can paste the rows directly or provide the file path."
Once you have the CSV:
python3 scripts/normalize_audit.py <csv_path> --assets-dir assets/Do not ask clarifying questions before producing the audit. Show the findings first so the merchant can see what needs attention.
Audit report format:
## Variant Option Audit
**Products scanned:** N
**Variants scanned:** N
**Issues found:** N
### 1. Option Value Aliases
[Table: Values | Products Affected | Suggested Canonical Value]
### 2. Case Inconsistencies
[Table: Values Found | Suggested Canonical Value | Products Affected]
### 3. Whitespace Issues
[Table: Field | Value (showing whitespace) | Product Handle]
### 4. Size Sequence Issues
[Table: Product Handle | Current Order | Expected Order]
### 5. Duplicate Variants
[Table: Product Handle | Option Combination | SKUs | Note]
### 6. Missing Variant Images
[Table: Product Handle | SKU | Option Values]
### 7. Option Name Inconsistencies
[Table: Option Names Found | Products Using Each | Suggested Canonical Name]
### 8. Handle/Title Drift
[Table: Handle | Title | Expected Handle | Note]
Only include sections where issues were found. Skip clean sections.
After the merchant reviews the audit, present the proposed normalization plan:
Wait for explicit confirmation before producing output. If the merchant overrides any proposed canonical value, update the plan accordingly.
Before writing any output, run this validation checklist. If any check fails, stop and report the issue to the merchant instead of producing a broken file.
If all checks pass, generate two outputs:
After generating both outputs, ask the merchant:
shopify-products-normalized.csv in the same directory).shopify-products-normalized-changelog.md).approved_mapping.json alongside the CSV. This file captures every original-to-canonical value mapping confirmed in Turn 2. On future exports from the same store, it can be reapplied directly to skip re-analysis for values already resolved.Write all confirmed files to their confirmed paths before closing out the turn.
Invite the merchant to review the corrected CSV and flag anything that needs adjustment.
Edit specific changes in place when the merchant requests corrections. Do not regenerate the entire CSV for a single fix. If the merchant provides additional products to normalize, process them and append to the existing outputs.
The audit script runs 8 checks: option value aliases, case inconsistencies, whitespace issues, size sequence ordering, duplicate variants, missing variant images, option name inconsistencies, and handle/title drift. For the full field-level specification of each check's JSON output, see references/json-schema.md.
Behavioral notes for interpreting results:
size_system field in size ordering findings and use it to constrain your alias proposals. Do not suggest apparel letter sizes for a product with an infant or numeric size system.Variant Image column. Detects missing URLs only. It does not validate whether populated URLs resolve.# Variant Normalization Change Log
**Source file:** [original filename or "pasted CSV"]
**Products scanned:** N
**Variants scanned:** N
**Total changes made:** N
---
## Canonical Option Values Established
| Original Value | Canonical Value | Products Affected | Category | Reason | Confidence | Needs Review |
|---|---|---|---|---|---|---|
Valid category values: `synonym_normalized`, `case_normalized`, `whitespace_removed`, `size_reordered`, `duplicate_flagged`, `image_flagged`, `option_name_standardized`, `handle_drift_flagged`.
## Issues Fixed
### Option Value Aliases Merged
| Original Value | Canonical Value | Products Affected | Confidence | Needs Review |
|---|---|---|---|---|
### Case Normalized
| Original Value | Canonical Value | Products Affected | Confidence | Needs Review |
|---|---|---|---|---|
### Whitespace Removed
| Field | Original Value | Product Handle | Confidence | Needs Review |
|---|---|---|---|---|
### Size Order Corrected
| Product Handle | Previous Order | Corrected Order | Confidence | Needs Review |
|---|---|---|---|---|
### Duplicate Variants Flagged
| Product Handle | Option Combination | SKUs | Action Taken | Confidence | Needs Review |
|---|---|---|---|---|---|
### Missing Variant Images Flagged
| Product Handle | SKU | Option Values | Note | Confidence | Needs Review |
|---|---|---|---|---|---|
### Option Names Standardized
| Original Name | Canonical Name | Products Affected | Confidence | Needs Review |
|---|---|---|---|---|
### Handle/Title Drift Flagged
| Handle | Title | Recommendation | Confidence | Needs Review |
|---|---|---|---|---|
---
## Skipped / Needs Review
[Anything that could not be resolved automatically, with explanation]
Only include sections where changes were made. Skip clean sections.
Common LLM behavior patterns that produce incorrect output for this skill:
The merchant may export only a subset of products. Normalize within the provided set. Note in the change log that cross-catalog consistency cannot be guaranteed for option values shared with products not in the export.
Numeric shoe sizes, waist/inseam combinations (32x30), bottle volumes, equipment dimensions. Do not force the apparel letter-size ladder on these. Use ascending numeric order. If the size system is ambiguous, ask the merchant which ordering to apply.
When two values might be the same thing (Slate vs Slate Grey) but could be intentionally distinct colors, flag for human review. Do not merge unless the merchant confirms. The audit report should present both values and ask.
Flag but do not merge. The merchant must decide which price is correct. Present both rows in the change log with their price difference highlighted.
Some exports do not include this column. If absent, skip the missing-image check and note it in the change log.
Process all rows. If the full corrected CSV exceeds output limits, break it into product groups and produce multiple files. Note any grouping in the change log.
This skill expects Shopify's native product export CSV format. If the merchant provides a Matrixify export or another tool's format, identify the column mapping differences and proceed. Note the format in the change log.
Some products use all three option slots (e.g., Size + Color + Material). The skill handles up to three option columns. Apply all detection categories to each option column independently.
The skill works with a single product. Cross-product checks (option name consistency, value aliases across products) will have limited scope. Note this in the change log.
Stop and report clearly to the merchant rather than proceeding with bad data. Do not attempt normalization on a file you cannot parse reliably.
| Error | What to do |
|---|---|
| Script exits with code 1 (fatal error) | Report the script error message verbatim. Fall back to LLM-only analysis if the merchant wants to continue. Note in the change log that the audit was performed without the script. |
| Missing required columns (Handle, Title, Option1 Name, Option1 Value) | Tell the merchant which columns are missing. Ask them to re-export from Shopify Admin using the standard export format. Do not proceed. |
| Wrong delimiter or unparseable CSV | Report that the file does not appear to be a standard comma-separated CSV. Mention that Shopify exports use UTF-8 encoding with comma delimiters. Ask the merchant to re-export or check if the file was opened and re-saved by Excel (which can change delimiters). |
| Corrupt rows (column count mismatch) | Report the row numbers that appear malformed. Ask the merchant to inspect those rows in a spreadsheet before re-uploading. Do not skip silently. |
| All products are default-title products | Tell the merchant the file contains only single-variant products (no option columns in use). There is nothing to normalize. Confirm whether they uploaded the correct file. |
| Unrecoverable ambiguity (e.g., two conflicting canonical values both supported by the data) | Present both options to the merchant with the evidence for each. Do not guess. Wait for explicit direction before proceeding. |
Once the merchant approves the corrected CSV, note that it is ready for Shopify bulk import via Settings > Import in the Shopify admin. Include these reminders:
Suggest running this skill again after the next supplier data import or seasonal catalog update to catch new inconsistencies before they reach customers.
same_product: true: These are almost certainly errors. Present them confidently. Cross-product aliases (same_product: false) may be intentional and should be framed as questions.