Extract and normalize Li-ion battery cell specifications from supplier PDFs to a standardized format. Handles heterogeneous supplier data formats including cylindrical (18650, 21700, 4680), pouch, and prismatic cells.
This skill extracts and standardizes battery cell specifications from supplier datasheets (PDFs). It handles variations in terminology, units, and table formats across different manufacturers.
Primary locations: "Electrical Characteristics", "Physical Specifications", "General Specifications", "Nominal Specification" tables.
Secondary sources: Header/footer areas, performance graphs, notes sections.
| Pattern | Identifier | Handling |
|---|---|---|
| A: Key-Value | 2-column (param | value) |
| Direct mapping |
| B: Multi-Condition | 3+ columns with merged cells | Inherit parent from previous row for empty cells |
| C: Visual Grouping | Indentation-based, no borders | Use text position for hierarchy |
| D: Multi-Product | Multiple model columns | Extract each column as separate cell object |
| Standard Field | Variations |
|---|---|
| Nominal Capacity | Rated Capacity, Typical Capacity, Cell Capacity |
| Nominal Voltage | Cell Voltage, Operating Voltage |
| DCIR | DC Internal Resistance, AC Impedance, Internal Resistance |
| Max Discharge | Maximum Continuous Discharge, Discharge Current |
Extract: Manufacturer, Model Number, Chemistry (NMC/LFP/NCA/NCMA), Cell Format (cylindrical/pouch/prismatic), Format Code (18650/21700/4680).
| Property | Notes | Conversion |
|---|---|---|
| Capacity | Look for Nominal/Rated/Typical; may have Typ/Min values | mAh → Ah (÷1000) |
| Voltage | Nominal, Max (charge cutoff), Min (discharge cutoff) | — |
| Impedance | ACIR at 1kHz most common; record test conditions | Ω → mΩ (×1000) |
| Current | Continuous discharge, charge, pulse (with duration) | C-rate → A: C × Capacity(Ah) |
Parse formats: "0~45°C", "0°C to 45°C", "-20 to 60 deg C" → {min_c, max_c}
Extract separately for: charge, discharge, storage ranges.
| Format | Dimensions |
|---|---|
| Cylindrical | Diameter (mm), Height (mm) |
| Pouch | Length × Width × Thickness (mm) |
| Prismatic | Length × Width × Height (mm) |
Weight: grams (convert kg → g if needed).
Parse: "500 cycles at 80% DOD" → {cycles: 500, dod: 80}
Record: temperature, C-rates, EOL SOH threshold.
Use canonical names: capacity.nominal_ah, voltage.nominal_v, impedance.acir.value_mohm, current.max_continuous_discharge_a, temperature.operating.charge.min_c.
| From | To | Operation |
|---|---|---|
| mAh | Ah | ÷ 1000 |
| Ω | mΩ | × 1000 |
| kg | g | × 1000 |
| C-rate | A | C × Capacity(Ah) |
null, add to fields_missingunit_original fieldnormalization_rules.yaml: Supplier-specific patternsvalidation_schema.json: JSON schema for output validationscripts/extractor.py: Deterministic parsing functionsManufacturer, Model Number, Nominal Capacity, Nominal Voltage, Cell Format.
| Property | Valid Range |
|---|---|
| Capacity | 0.5 - 200 Ah |
| Nominal Voltage | 2.5 - 4.5 V |
| Max Voltage | ≤ 5.0 V |
| DCIR | 1 - 200 mΩ |
| Weight | 10 - 5000 g |
Output follows validation_schema.json. Top-level objects:
cell_info: manufacturer, model, chemistry, format, confidencemechanical: dimensions, weight, volumeelectrical: capacity, voltage, impedance, currenttemperature: operating ranges (charge/discharge/storage)lifetime: cycle life specificationsderived: calculated propertiesextraction_metadata: fields_extracted, fields_missing, warnings| Property | Calculation |
|---|---|
| Energy (Wh) | Voltage × Capacity |
| Max Power (W) | Voltage × Max Discharge Current |
| Gravimetric Energy (Wh/kg) | Energy ÷ Weight |
| Volumetric Energy (Wh/L) | Energy ÷ Volume |
python sheets_exporter.py output.json --sheet "Battery Cell Database"
Input (Pattern B - multi-condition):
2.1 Capacity: Typical 5000mAh, Minimum 4850mAh
2.2 Voltage: Nominal 3.6V, Charge 4.2V, Cutoff 2.5V
2.3 Impedance: AC 1kHz ≤18mΩ (50% SOC, 25°C)
Extraction:
Output (abbreviated):
{
"cell_info": {"manufacturer": "Samsung", "model_number": "INR21700-50E", "chemistry": "NMC", "cell_format": "cylindrical"},
"electrical": {
"capacity": {"nominal_ah": 5.0, "minimum_ah": 4.85},
"voltage": {"nominal_v": 3.6, "max_v": 4.2, "min_v": 2.5},
"impedance": {"acir": {"value_mohm": 18.0, "frequency_hz": 1000, "temperature_c": 25, "soc_percent": 50}}
},
"derived": {"energy_wh": {"value": 18.0, "calculation": "3.6V × 5.0Ah"}}
}