Name: Python Parallelization Skill
Author: benchflow-ai

Python Parallelization Skill

Transform sequential Python code into parallel/concurrent implementations. Use when asked to parallelize Python code, improve code performance through concurrency, convert loops to parallel execution, or identify parallelization opportunities. Handles CPU-bound (multiprocessing), I/O-bound (asyncio, threading), and data-parallel (vectorization) scenarios.

benchflow-ai990 스타2026. 1. 22.

직업
카테고리: 프레임워크 내부 구조

Transform sequential Python code to leverage parallel and concurrent execution patterns.

Workflow

Analyze the code to identify parallelization candidates
Classify the workload type (CPU-bound, I/O-bound, or data-parallel)
Select the appropriate parallelization strategy
Transform the code with proper synchronization and error handling
Verify correctness and measure expected speedup

Parallelization Decision Tree

Is the bottleneck CPU-bound or I/O-bound?

CPU-bound (computation-heavy):
├── Independent iterations? → multiprocessing.Pool / ProcessPoolExecutor
├── Shared state needed? → multiprocessing with Manager or shared memory
├── NumPy/Pandas operations? → Vectorization first, then consider numba/dask
└── Large data chunks? → chunked processing with Pool.map

I/O-bound (network, disk, database):
├── Many independent requests? → asyncio with aiohttp/aiofiles
├── Legacy sync code? → ThreadPoolExecutor
├── Mixed sync/async? → asyncio.to_thread()
└── Database queries? → Connection pooling + async drivers

Data-parallel (array/matrix ops):
├── NumPy arrays? → Vectorize, avoid Python loops
├── Pandas DataFrames? → Use built-in vectorized methods
├── Large datasets? → Dask for out-of-core parallelism
└── GPU available? → Consider CuPy or JAX

Python Parallelization Skill

benchflow-ai990 스타2026. 1. 22.

직업
카테고리: 프레임워크 내부 구조

Parallelization Decision Tree

Is the bottleneck CPU-bound or I/O-bound? CPU-bound (computation-heavy): ├── Independent iterations? → multiprocessing.Pool / ProcessPoolExecutor ├── Shared state needed? → multiprocessing with Manager or shared memory ├── NumPy/Pandas operations? → Vectorization first, then consider numba/dask └── Large data chunks? → chunked processing with Pool.map I/O-bound (network, disk, database): ├── Many independent requests? → asyncio with aiohttp/aiofiles ├── Legacy sync code? → ThreadPoolExecutor ├── Mixed sync/async? → asyncio.to_thread() └── Database queries? → Connection pooling + async drivers Data-parallel (array/matrix ops): ├── NumPy arrays? → Vectorize, avoid Python loops ├── Pandas DataFrames? → Use built-in vectorized methods ├── Large datasets? → Dask for out-of-core parallelism └── GPU available? → Consider CuPy or JAX

Pattern	Indicator	Strategy
`for item in collection` with independent iterations	No shared mutation	`Pool.map` / `executor.map`
Multiple `requests.get()` or file reads	Sequential I/O	`asyncio.gather()`
Nested loops over arrays	Numerical computation	NumPy vectorization
`time.sleep()` or blocking waits	Waiting on external	Threading or async
Large list comprehensions	Independent transforms	`Pool.map` with chunking

Python Parallelization Skill

Workflow

Parallelization Decision Tree

Python Parallelization Skill

Workflow

Parallelization Decision Tree

Transformation Patterns

Pattern 1: Loop to ProcessPoolExecutor (CPU-bound)

Pattern 2: Sequential I/O to Async (I/O-bound)

Pattern 3: Nested Loops to Vectorization

Pattern 4: Mixed CPU/IO with asyncio

Parallelization Candidates

Safety Requirements

Common Pitfalls

Verification Checklist

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2