Transform sequential Python code into parallel/concurrent implementations. Use when asked to parallelize Python code, improve code performance through concurrency, convert loops to parallel execution, or identify parallelization opportunities. Handles CPU-bound (multiprocessing), I/O-bound (asyncio, threading), and data-parallel (vectorization) scenarios.
Transform sequential Python code to leverage parallel and concurrent execution patterns.
Is the bottleneck CPU-bound or I/O-bound?
CPU-bound (computation-heavy):
├── Independent iterations? → multiprocessing.Pool / ProcessPoolExecutor
├── Shared state needed? → multiprocessing with Manager or shared memory
├── NumPy/Pandas operations? → Vectorization first, then consider numba/dask
└── Large data chunks? → chunked processing with Pool.map
I/O-bound (network, disk, database):
├── Many independent requests? → asyncio with aiohttp/aiofiles
├── Legacy sync code? → ThreadPoolExecutor
├── Mixed sync/async? → asyncio.to_thread()
└── Database queries? → Connection pooling + async drivers
Data-parallel (array/matrix ops):
├── NumPy arrays? → Vectorize, avoid Python loops
├── Pandas DataFrames? → Use built-in vectorized methods
├── Large datasets? → Dask for out-of-core parallelism
└── GPU available? → Consider CuPy or JAX
Before:
results = []
for item in items:
results.append(expensive_computation(item))
After:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
results = list(executor.map(expensive_computation, items))
Before:
import requests
def fetch_all(urls):
return [requests.get(url).json() for url in urls]
After:
import asyncio
import aiohttp
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_one(session, url) for url in urls]
return await asyncio.gather(*tasks)
async def fetch_one(session, url):
async with session.get(url) as response:
return await response.json()
Before:
result = []
for i in range(len(a)):
row = []
for j in range(len(b)):
row.append(a[i] * b[j])
result.append(row)
After:
import numpy as np
result = np.outer(a, b)
import asyncio
from concurrent.futures import ProcessPoolExecutor
async def hybrid_pipeline(data, urls):
loop = asyncio.get_event_loop()
# CPU-bound in process pool
with ProcessPoolExecutor() as pool:
processed = await loop.run_in_executor(pool, cpu_heavy_fn, data)
# I/O-bound with async
results = await asyncio.gather(*[fetch(url) for url in urls])
return processed, results
Look for these patterns in code:
| Pattern | Indicator | Strategy |
|---|---|---|
for item in collection with independent iterations | No shared mutation | Pool.map / executor.map |
Multiple requests.get() or file reads | Sequential I/O | asyncio.gather() |
| Nested loops over arrays | Numerical computation | NumPy vectorization |
time.sleep() or blocking waits | Waiting on external | Threading or async |
| Large list comprehensions | Independent transforms | Pool.map with chunking |
Always preserve correctness when parallelizing:
executor.submit() for granular error handlingmap() over submit() when order mattersasync to existing code—requires restructuring call chainBefore finalizing transformed code: