Archivo del skill

Driver Performance Guide

Name: Driver Performance Guide
Author: datasurface

Database driver selection and bulk insert optimization for DataSurface. Covers psycopg2 vs psycopg3, pyodbc fast_executemany, SQLAlchemy bypasses, and the execute_fast_insert pattern. Includes measured benchmarks.

datasurface0 estrellas22 mar 2026

Ocupación
Categorías: Bases de Datos SQL

Contenido de la habilidad

Database driver choice and insert method have a dramatic impact on DataSurface throughput. The wrong combination can be 10-25x slower. This guide covers what we measured, what we chose, and why.

Summary of Benchmarks

All benchmarks measured on the same hardware, inserting 10,000 rows into a staging table:

Database	Driver	Method	Rows/sec	Notes
PostgreSQL	psycopg2	SA `executemany`	5,500	One round-trip per row
PostgreSQL	psycopg3	SA `executemany`	51,700	Proper batching via pipeline
PostgreSQL	psycopg3	Multi-row VALUES	169,000	Fastest but requires string formatting
SQL Server	pyodbc	SA `text()` executemany	3,263	SA bypasses fast_executemany
SQL Server

Skills relacionados

Driver Performance Guide | Skills Pool

# In PostgresDatabase adapter
def get_driver_name(self) -> str:
    return "postgresql+psycopg"  # NOT "postgresql" (psycopg2)

psycopg[binary]==3.2.9   # NOT psycopg2-binary

FROM apache/airflow:3.1.8
RUN pip install psycopg[binary]==3.2.9

from datasurface.platforms.yellow.database_operations import execute_fast_insert, is_pyodbc_connection

# Usage in merge/ingestion code:
execute_fast_insert(
    connection=connection,
    sql=insert_sql,          # INSERT INTO ... VALUES (:col1, :col2, ...)
    params=list_of_dicts,    # [{"col1": v1, "col2": v2}, ...]
    logger=logger
)

# MySQL adapter signals this:
def supports_batch_values_insert(self) -> bool:
    return True  # Use VALUES string building

Database	SQLAlchemy Driver	Bulk Insert Method	Adapter Flag
PostgreSQL	`postgresql+psycopg`	Native executemany (psycopg3 pipeline)	`supports_batch_values_insert = False`
SQL Server	`mssql+pyodbc`	Raw cursor `fast_executemany=True`	`supports_batch_values_insert = False`
MySQL	`mysql+pymysql`	Multi-row VALUES via `format_sql_value`	`supports_batch_values_insert = True`
Oracle	`oracle+oracledb`	Native executemany	`supports_batch_values_insert = False`
DB2	`db2+ibm_db`	Native executemany	`supports_batch_values_insert = False`
Snowflake	`snowflake`	Native executemany	`supports_batch_values_insert = False`

Driver Performance Guide

Summary of Benchmarks

Driver Performance Guide

Summary of Benchmarks

PostgreSQL: psycopg3

Why psycopg3

Configuration

Airflow Worker Image

SQL Server: pyodbc fast_executemany

The Problem

The Pattern: execute_fast_insert

Why Not Multi-Row VALUES for SQL Server?

MySQL: Multi-Row VALUES (Exception)

format_sql_value

DB-Specific Driver Summary

Key Takeaways

Postgres Patterns

Postgres Patterns

Database Migrations

Postgres Patterns

Postgres Patterns

Jpa Patterns