Name: Structured Content Storage
Author: foryourhealth111-pixel

Structured Content Storage | Skills Pool

project-name/
├── README.md                 # Project overview and directory guide
├── src/                      # Source code with detailed comments
│   ├── main.py              # Main entry point
│   └── utils.py             # Utility functions
├── data/                     # Data files
│   ├── raw/                 # Original data
│   ├── processed/           # Cleaned/transformed data
│   └── DATA_DICTIONARY.md   # Data field descriptions
├── docs/                     # Documentation
│   ├── PROCESS.md           # Step-by-step process description
│   └── CHANGELOG.md         # Modification history
├── outputs/                  # Results, models, reports
└── requirements.txt          # Dependencies

ml-training-project/
├── README.md                 # Project overview
├── src/
│   ├── train.py             # Training script with detailed comments
│   ├── model.py             # Model architecture
│   ├── data_loader.py       # Data loading utilities
│   └── evaluate.py          # Evaluation metrics
├── data/
│   ├── raw/                 # Original datasets
│   ├── processed/           # Preprocessed data
│   └── DATA_DICTIONARY.md   # Feature descriptions
├── models/                   # Saved model checkpoints
├── logs/                     # Training logs
├── docs/
│   ├── TRAINING_PROCESS.md  # Training methodology
│   └── MODEL_ARCHITECTURE.md # Model design decisions
└── requirements.txt

data-cleaning-project/
├── README.md
├── src/
│   ├── clean.py             # Main cleaning script
│   ├── validators.py        # Data validation functions
│   └── transformers.py      # Transformation utilities
├── data/
│   ├── raw/                 # Original data
│   ├── processed/           # Cleaned data
│   ├── DATA_DICTIONARY.md   # Field descriptions
│   └── QUALITY_REPORT.md    # Data quality metrics
├── docs/
│   └── CLEANING_PROCESS.md  # Cleaning steps and rationale
└── requirements.txt

"""
Module: data_processor.py
Purpose: Process and transform raw sensor data into analysis-ready format

Main components:
- DataLoader: Reads raw CSV files
- DataCleaner: Handles missing values and outliers
- DataTransformer: Applies normalization and feature engineering
"""

def clean_sensor_data(df, threshold=0.95):
    """
    Clean sensor data by removing outliers and handling missing values.

    Args:
        df (pd.DataFrame): Raw sensor data with columns [timestamp, sensor_id, value]
        threshold (float): Completeness threshold (0-1) for keeping sensors

    Returns:
        pd.DataFrame: Cleaned data with outliers removed and missing values imputed

    Process:
        1. Remove sensors with >5% missing data
        2. Detect outliers using IQR method (1.5 * IQR)
        3. Impute remaining missing values with forward fill
    """
    # Remove sensors with insufficient data
    # Threshold of 0.95 means sensor must have 95% valid readings
    completeness = df.groupby('sensor_id')['value'].count() / len(df)
    valid_sensors = completeness[completeness >= threshold].index
    df = df[df['sensor_id'].isin(valid_sensors)]

    # Detect and remove outliers using IQR method
    Q1 = df['value'].quantile(0.25)
    Q3 = df['value'].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR  # Standard outlier detection threshold
    upper_bound = Q3 + 1.5 * IQR
    df = df[(df['value'] >= lower_bound) & (df['value'] <= upper_bound)]

    # Forward fill remaining missing values
    # Assumes temporal continuity in sensor readings
    df = df.sort_values(['sensor_id', 'timestamp'])
    df['value'] = df.groupby('sensor_id')['value'].fillna(method='ffill')

    return df

## [Version 1.2.0] - 2026-01-19

### Changed
- Modified `train.py:45-67` to add early stopping mechanism
  - Reason: Prevent overfitting on small validation sets
  - Added `patience` parameter (default=10 epochs)
  - Monitors validation loss instead of training loss

### Added
- New function `evaluate.py:calculate_confusion_matrix()`
  - Provides detailed classification metrics
  - Outputs confusion matrix visualization

### Fixed
- Fixed data loader bug in `data_loader.py:123`
  - Issue: Incorrect handling of missing timestamps
  - Solution: Added explicit timestamp validation and interpolation

### Files Affected
- `src/train.py` (lines 45-67, 89-92)
- `src/evaluate.py` (new function added)
- `src/data_loader.py` (line 123)
- `docs/TRAINING_PROCESS.md` (updated early stopping section)

Structured Content Storage

Structured Content Storage Skill

When to Use This Skill

Not For / Boundaries

Structured Content Storage

Structured Content Storage Skill

When to Use This Skill

Not For / Boundaries

Quick Reference

Core Principles

Common Patterns

Examples

Example 1: Creating ML Training Script

Example 2: Creating Data Cleaning Script

Example 3: Modifying Existing Structured Project

References

Maintenance

Prose

Coding Agent (bash-first)

Create Prompt

Strategic Compact

Strategic Compact

Strategic Compact