Name: Data Processing
Author: PedroHBO

Data Processing | Skills Pool

Clean this CSV file: data/raw/sales.csv — remove duplicates, fill missing values in the 'revenue' column with the median, convert 'date' to datetime, and save to data/clean/sales.csv. Use Pandas.

Convert this Pandas data processing script to Polars for better performance on large datasets: scripts/transform.py

Analyze and optimize memory usage of the dataframe in scripts/load_data.py — downcast integers, use category dtype for low-cardinality strings.

Create a Python data pipeline that reads all CSV files from data/input/, applies these transformations: remove nulls, normalize numeric columns, add a 'processed_at' timestamp column, and writes to data/output/ as Parquet.

Write a NumPy function that computes the cosine similarity matrix between all pairs of vectors in a 2D array of shape (n, d). Use vectorized operations only.

Data Processing

Skill: data-processing

Data Processing with Pandas, Polars & NumPy

Table of Contents

Data Processing

Skill: data-processing

Data Processing with Pandas, Polars & NumPy

Table of Contents

When to Trigger

Review Workflow

Scope

Best Practices

Checklists

Data Loading Checklist

Data Cleaning Checklist

Data Transformation Checklist

Performance Checklist

OpenCode Usage Examples

Database Migrations Migration Observability

Computer Vision Expert

Ai Studio Image

Astropy

Performance Engineer

Cosmosdb Datamodeling