Practical data processing using Pandas, Polars, and NumPy. Use for data cleaning, transformation, and analysis tasks in Python data pipelines.
Comprehensive data processing skill for OpenCode — covers data cleaning, transformation, analysis, memory optimization, and pipeline patterns using Pandas, Polars, and NumPy. Triggered when the user mentions data processing, data cleaning, CSV, dataframe, pandas, polars, numpy, data transformation, or data analysis.
Activate this skill when user input contains:
dtype optimization).int32 vs int64, category for low-cardinality strings)df.info(memory_usage='deep')Clean a messy CSV:
Clean this CSV file: data/raw/sales.csv — remove duplicates, fill missing values in the 'revenue' column with the median, convert 'date' to datetime, and save to data/clean/sales.csv. Use Pandas.
Convert Pandas to Polars for performance:
Convert this Pandas data processing script to Polars for better performance on large datasets: scripts/transform.py
Optimize memory usage:
Analyze and optimize memory usage of the dataframe in scripts/load_data.py — downcast integers, use category dtype for low-cardinality strings.
Create a data pipeline:
Create a Python data pipeline that reads all CSV files from data/input/, applies these transformations: remove nulls, normalize numeric columns, add a 'processed_at' timestamp column, and writes to data/output/ as Parquet.
NumPy numerical analysis:
Write a NumPy function that computes the cosine similarity matrix between all pairs of vectors in a 2D array of shape (n, d). Use vectorized operations only.
Following Agent Skills specification for progressive disclosure and best practices.