Exploratory Data Analysis for tabular data. Use when analyzing column distributions, checking data quality, examining class balance, detecting missing patterns, or generating summary statistics for datasets.
Analyze tabular datasets to understand distributions, data quality, and patterns.
| Analysis |
|---|
| Description |
|---|
| Column Distribution | Value counts, percentages, cardinality assessment |
| Missing Data | Null counts, patterns (MCAR/MAR/MNAR) |
| Class Balance | Imbalance detection for classification targets |
| Summary Stats | Count, unique, nulls per column |
For detailed analysis methodology and output format:
Cardinality Levels:
| Level | Criteria | Action |
|---|---|---|
| Low | ≤10 unique | Good for categorical encoding |
| Medium | 11-100 or <1% of rows | May need encoding strategy |
| High | >100 and <50% of rows | Consider grouping/binning |
| Very High | >50% of rows | Likely identifier, exclude |
Missing Data Thresholds:
| Percentage | Assessment |
|---|---|
| 0% | No missing data |
| <1% | Minimal - safe to drop or impute |
| 1-5% | Some - consider imputation strategy |
| >5% | Significant - investigate pattern |
Class Imbalance:
80% in top class: Imbalance detected
95% in top class: Extreme imbalance
# Column Distribution: {column_name}
- **source**: path/to/data
- **column**: column_name
## Summary
- Total rows: N
- Null/missing: N (X%)
- Unique values: N
- Cardinality: Low|Medium|High|Very High
## Distribution
| Value | Count | Percentage | Cumulative |
|-------|-------|------------|------------|
## Observations
- Auto-generated insights