Name: Polars Scientific
Author: havhje

Polars for Scientific and ML Workflows

Use this skill when writing scientific or data analysis code using Polars alongside NumPy: any task involving DataFrames and numerical arrays, including grouping, joining, reshaping, or aggregating tabular data alongside NumPy computations.

Rules

Think like a database. Research is often about comparing quantities across dimensions (layers, datasets, samples, hyperparameters). Build DataFrames, use joins and pivoting -- not nested loops or dicts.
Stay in Polars expressions as long as possible. Leave the engine only when NumPy or scipy is genuinely needed (linear algebra, signal processing, custom distance functions).
Groups are for aggregation, not iteration. Prefer group_by().agg() over any form of row-wise looping. When NumPy is required per group, collect into records with a list comprehension.
Reshape tensors before creating DataFrames. einops (rearrange, repeat) is the cleanest way to flatten high-dimensional arrays; plain NumPy reshaping also works.

Polars for Scientific and ML Workflows

Rules

Think like a database. Research is often about comparing quantities across dimensions (layers, datasets, samples, hyperparameters). Build DataFrames, use joins and pivoting -- not nested loops or dicts.
Stay in Polars expressions as long as possible. Leave the engine only when NumPy or scipy is genuinely needed (linear algebra, signal processing, custom distance functions).
Groups are for aggregation, not iteration. Prefer group_by().agg() over any form of row-wise looping. When NumPy is required per group, collect into records with a list comprehension.
Reshape tensors before creating DataFrames. einops (rearrange, repeat) is the cleanest way to flatten high-dimensional arrays; plain NumPy reshaping also works.

Polars Scientific

Polars for Scientific and ML Workflows

Rules

Polars Scientific

Polars for Scientific and ML Workflows

Rules

Selectors: match columns by type or name pattern

Examples

Collect results into a tidy DataFrame

Join two measurements on shared keys

Handling nulls and join_nulls

Store vectors as Array columns

Unpivot wide measurements into tidy format

Explode two array columns and tag their origin

Unnest struct columns from aggregations

Apply a NumPy function per group

over and rank: first reading to exceed each sensor's personal best

Flatten a batch of features with einops

group_by_dynamic: time-based and index-based windows

Reference

Missing Polars features

enumerate

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns