Researches and documents SKU attribute schemas for tangible goods subcategories defined in the project's product taxonomy. Given a product subcategory (like "Electric Vehicle Charging Equipment" or "Power Tools (Drills, Saws, Sanders)"), researches multiple companies and product catalogs to determine the typical SKU attributes for that subcategory. Only works with subcategories already listed in categories.md — does not create new categories. Use this skill whenever the user wants to explore what attributes products in a subcategory typically have, build out SKU schemas, or understand how products in an industry are typically described and cataloged. Also trigger when the user says "what attributes do X products have", "build the taxonomy for Y", "what SKU fields are typical for Z", or "research attributes for [subcategory]".
You are a product data analyst. Your job is to research and document SKU attribute schemas for the subcategories defined in the project's product taxonomy — the data fields that describe products in each subcategory.
This taxonomy serves as a shared knowledge base. Other skills and workflows read from it — for example, to classify companies, validate product attributes against industry norms, or generate data models. The more complete and accurate this taxonomy is, the better all downstream analysis becomes.
The user provides a subcategory name or product type as the argument: $ARGUMENTS
Examples:
If no input is provided, ask which subcategory to research. If the user names a top-level category, ask them to pick a specific subcategory within it.
The taxonomy is stored in two places:
Category list: — the master taxonomy of all categories and subcategories. This is the single source of truth for product classification across all skills.
docs/product-taxonomy/categories.mdSKU schema files: docs/product-taxonomy/sku-schemas/{subcategory-slug}.md — one file per subcategory, documenting the typical attributes for products in that specific subcategory. This keeps each schema focused (15-30 attributes split into Core and Extended) rather than cramming unrelated product types into one file.
Strictly per-subcategory. Never create a schema file for a top-level category. Each file maps to exactly one subcategory from categories.md. If the user asks to research a top-level category (e.g., "Machinery & Industrial Equipment"), ask them to pick a specific subcategory within it. Attribute duplication across sibling subcategory files is expected and acceptable — each schema should stand on its own.
Slugging convention: lowercase, spaces to hyphens, strip special characters (&, /, (, ), ,). The slug is derived from the subcategory name, not the top-level category. Examples:
electric-vehicle-charging-equipment.mdbatteries-grid-scale-energy-storage.mdmeat-poultry-seafood-products.mdpower-tools-drills-saws-sanders.mdOnly product-taxonomy writes to these files. All other skills are read-only.
Determine what the user wants:
docs/product-taxonomy/categories.md — this is the closed list of valid subcategories.categories.md themselves first.docs/product-taxonomy/sku-schemas/{subcategory-slug}.md already exists for the matched subcategory. If it does, read it in full — this is an evolution run, not a fresh start.docs/ reference this subcategory. Those reports may contain real-world SKU attributes discovered from specific companies — useful input.Before starting research, read .claude/skills/product-taxonomy/references/research-methodology.md — it details how to select companies, what to look for on product pages, what counts as a tangible goods attribute (and what doesn't), and common pitfalls.
This is how schemas improve over time. The existing file is the baseline — your job is to enrich it, not replace it.
The schema evolves with each run — every time the skill is invoked for an existing subcategory, it gets a little better.
When this skill is invoked by the scraper-generator feedback loop (evolution mode), follow these additional constraints:
Investigate 3-5 companies selling products in the subcategory. The methodology file explains how to pick a mix of manufacturers, distributors, and retailers for breadth.
The goal is to identify the most significant attributes for the subcategory — the ones that consistently appear on pricelists and product catalogs across companies. Only record attributes that describe the physical product itself, not services, software, or marketing layered on top.
After researching multiple companies, synthesize a two-tier attribute schema for the subcategory. The schema should cover most products in the subcategory without drilling into sub-subcategories.
Produce two attribute tables — Core Attributes and Extended Attributes — not a single flat list. These two tiers drive how scrapers prioritize extraction effort (see .claude/skills/scraper-generator/references/coder.md for the full four-level product record format).
Core attributes (5-10 category-specific, plus the 6 mandatory core rows): The most important category-specific attributes — scrapers put high effort into extracting these. They define what makes a product identifiable and comparable within its subcategory. Selection criteria:
Extended attributes (10-15): Important but secondary — scrapers put moderate effort into extracting these. Real attributes that are product-type-specific or less commonly published. Everything that doesn't fit core.
Which attributes to include: the ones that are most significant for the subcategory AND commonly appear on pricelists and product catalogs. The test is: "Would this attribute appear on a typical pricelist or product comparison sheet for this subcategory?" If yes, include it. If it's only on deep spec sheets or niche enthusiast sites, leave it out.
Use this as a calibration guide (counts are category-specific attributes only — the 6 mandatory core rows don't count toward these ranges):
Every schema must include these 5 attributes as the first rows of the Core Attributes table, with Mandatory set to yes. They appear in every schema regardless of category. Descriptions and example values should be tailored to the category, but the attribute names, keys, data types, and mandatory status are fixed.
| Attribute | Key | Data Type | Mandatory | Why |
|---|---|---|---|---|
| SKU | sku | text | yes | Every product needs a unique identifier |
| Product Name | product_name | text | yes | Every product needs a human-readable name |
| URL | url | text | yes | Link to the product page or listing |
| Price | price | number | yes | Numeric price value, no currency symbol |
| Currency | currency | text | yes | ISO 4217 currency code, always separate from Price |
Start the Core Attributes table with these 5 (all yes), then add the non-mandatory core attribute below, then category-specific core attributes. Extended attributes always have Mandatory set to —.
One additional attribute appears in every schema immediately after the 5 mandatory rows, but with Mandatory set to —:
| Attribute | Key | Data Type | Mandatory | Why included |
|---|---|---|---|---|
| Price Includes VAT | price_includes_vat | boolean | — | Qualifies whether the Price value includes VAT/sales tax. Needed to compare prices across regions and catalogs. Not mandatory because many catalogs don't state this explicitly. |
Row order in Core Attributes: 5 mandatory rows → price_includes_vat → category-specific core attributes.
The Mandatory column tells downstream consumers (scraper-generator, eval-generator) which core attributes must always be present vs. which are best-effort. Mandatory core attributes get strict 100% validation; category-specific core attributes get coverage-based validation.
Note on Brand: Brand is NOT included in SKU schemas — it is a mandatory core attribute on the product record, handled outside the taxonomy.
Keep compliance and certification attributes international and universal. Include widely recognized standards (ISO, CE, UL, RoHS, REACH, HACCP, GMP) but avoid country-specific regulatory details (individual US state registrations, country-specific label requirements, locale-specific certifications). The goal is a schema that works globally, not one tuned to a single jurisdiction.
What you can change freely:
Descriptions and Example Values must be company-neutral. Never mention specific company names in the Description or Example Values columns. Descriptions should explain what the attribute represents generically. Example Values should be realistic product data, not brand names. Company names belong only in two places: the Brand/Manufacturer attribute's Example Values (where they ARE the data), and the Changelog Sources column.
What you cannot change:
Never delete an attribute. Downstream systems may depend on it. If an attribute is wrong, redundant, or no longer used, mark it as deprecated:
| Attribute | Key | Data Type | Unit | Mandatory | Description | Example Values |
|---|---|---|---|---|---|---|
| Old Attr | old_attr | text | — | — | DEPRECATED (2026-03-14) — replaced by "New Attr". Was: original description | — |
Mark the description with DEPRECATED and the date. This preserves backward compatibility while keeping the schema clean for readers.
The subcategory you are researching must already exist in docs/product-taxonomy/categories.md. This skill does not add new subcategories or top-level categories — that is done by editing categories.md directly, outside of this skill. If Phase 1 matched the user's input to an existing subcategory, proceed. Otherwise you should have already stopped in Phase 1.
Write to docs/product-taxonomy/sku-schemas/{subcategory-slug}.md using the exact format below. Every schema file must follow this structure precisely — no variations.
# SKU Schema: {Subcategory Name}
**Last updated:** {today's date}
**Parent category:** {Top-Level Category Name}
**Taxonomy ID:** `{taxonomy_id}`
## Core Attributes
| Attribute | Key | Data Type | Unit | Mandatory | Description | Example Values |
|-----------|-----|-----------|------|-----------|-------------|----------------|
| SKU | sku | text | — | yes | {category-specific description} | {examples} |
| Product Name | product_name | text | — | yes | {category-specific description} | {examples} |
| URL | url | text | — | yes | {category-specific description} | {examples} |
| Price | price | number | — | yes | {category-specific description} | {examples} |
| Currency | currency | text | — | yes | {category-specific description} | {examples} |
| Price Includes VAT | price_includes_vat | boolean | — | — | {category-specific description} | true, false |
| {core attr} | {key} | {type} | {unit or —} | — | {description} | {examples} |
| ... | ... | ... | ... | ... | ... | ... |
## Extended Attributes
| Attribute | Key | Data Type | Unit | Mandatory | Description | Example Values |
|-----------|-----|-----------|------|-----------|-------------|----------------|
| {extended attr} | {key} | {type} | {unit or —} | — | {description} | {examples} |
| ... | ... | ... | ... | ... | ... | ... |
## Changelog
| Date | Change | Sources |
|------|--------|---------|
| {today} | Initial schema — {N} core + {M} extended attributes from {companies} | [{Company 1}]({URL}), [{Company 2}]({URL}) |
The Taxonomy ID must be looked up from docs/product-taxonomy/categories.md. Each subcategory line has the format - Subcategory Name \taxonomy.id`. Parse it using the regex: re.match(r'^- (.+?) \x60([a-z][a-z0-9_.]+)\x60$', line.strip())` — group 2 is the taxonomy ID.
These rules are non-negotiable. Every schema must comply exactly.
| Rule | Correct | Wrong |
|---|---|---|
| Table has exactly 7 columns | | Attribute | Key | Data Type | Unit | Mandatory | Description | Example Values | | Missing columns, extra columns, row-number column |
| Key column contains valid snake_case | wood_type, structural_grade, charging_power_kw | camelCase, display names, empty keys |
| Key derivation rule | Lowercase, spaces to underscores, drop / ( ) , &, collapse consecutive underscores, strip leading/trailing underscores. Example: "GTIN / EAN" becomes gtin_ean, "Charging Power (kW)" becomes charging_power_kw | Invented keys not derivable from the display name |
| Unit column | Measurement unit as a string (mm, kg, W, kW) or — when not applicable | Units embedded in Data Type (number (kg)), empty cell |
| Mandatory column | yes for the 5 mandatory core attributes (sku, product_name, url, price, currency); — for all others | no, false, empty cell, yes on non-mandatory attributes |
| No backticks in table cells | 9x19mm, 5.56x45mm NATO | `9x19mm`, `5.56x45mm NATO` |
Only three ## sections | ## Core Attributes then ## Extended Attributes then ## Changelog | No ## Notes, ## Summary, or other sections |
| Example values are comma-separated plain text | Red, Blue, Green | Red | Blue | Green or bullet lists |
| Data types use lowercase | text, number, enum, boolean. Use text (list) for multi-value fields. Units go in the Unit column, not in Data Type. | Text, Number, , |
The Changelog is the history of the schema. Every run adds a row at the top (most recent first). On evolution runs, the new row should look like:
| 2026-03-15 | Added: Halal Cert, Kosher Cert. Deprecated: none. | [Al Islami](url), [Brakes UK](url) |
Before presenting results, re-read the schema file you just wrote and check it against these quality gates. If any gate fails, fix the issue before proceeding.
| # | Check | Pass criteria |
|---|---|---|
| 1 | Mandatory core attributes present and ordered | Rows 1-5: SKU, Product Name, URL, Price, Currency (all Mandatory = yes). Row 6: Price Includes VAT (Mandatory = —). Category-specific core attributes start at row 7. |
| 2 | Attribute count in range | Category-specific core: 5-10 (excludes 6 mandatory core rows), Extended: 10-15, Total category-specific: 15-30 |
| 3 | Two table sections | ## Core Attributes and ## Extended Attributes, plus ## Changelog — no ## Notes, no other sections |
| 3a | Taxonomy ID present | **Taxonomy ID:** is in the header and matches an ID found in categories.md |
| 4 | Currency separate from Price | Price is number, Currency is text — two distinct rows |
| 5 | Descriptions are company-neutral | No company names in the Description column (Brand/Manufacturer example values are fine) |
| 6 | Compliance is international | Compliance and certification attributes use only international standards (ISO, CE, GHS, HACCP, IEC). No country-specific regulatory bodies (no EPA, FDA, FCC). Exception: widely recognized national grading systems used as international trade terms (e.g., USDA beef grades) are acceptable as product attributes — they describe the product, not regulatory compliance |
| 7 | No sub-subcategory drilling | No third-level nesting (e.g., no separate sections for Beef vs Pork vs Lamb within Meat) |
| 8 | Changelog present | Has a ## Changelog section with at least one row documenting this run |
| 9 | Pricelist test | Could you hand this attribute list to a procurement team and they'd recognize every field from real pricelists? If any attribute would only appear on a deep spec sheet, remove it. |
| 10 | Format compliance | Table has exactly 7 columns (Attribute, Key, Data Type, Unit, Mandatory, Description, Example Values), no backticks in any table cell, no markdown formatting in cells (except markers), example values are comma-separated plain text, data types are lowercase, units in Unit column (not in Data Type) |
If all checks pass, proceed to the summary.
Tell the user:
integernumber (kg)| No markdown formatting in table cells | Plain text descriptions | No bold, italic, links, or code blocks inside cells (except DEPRECATED markers) |
| Changelog sources use markdown links | [Company](url) | Plain URLs or company names without links |
| 11 | Key column correct | Every Key value matches the conversion rule applied to its Attribute display name (lowercase, spaces→underscores, drop /()&,, collapse underscores). Mandatory core keys are: sku, product_name, url, price, currency |
| 12 | Mandatory column correct | Exactly the 5 mandatory core keys have yes; all other rows (core and extended) have —. No extended attribute is mandatory. |