Search and download financial data from WRDS (Wharton Research Data Services). Use when asked to "download from WRDS", "search WRDS", "get CRSP data", "download Compustat", "find WRDS table", "get stock returns", "download IBES", or any WRDS/financial database task involving CRSP, Compustat, IBES, TAQ, OptionMetrics, Fama-French, BoardEx, DealScan, or other WRDS datasets.
Download and query data from WRDS (Wharton Research Data Services) using the wrds Python package.
Before any WRDS operation, verify the environment is set up:
uv run --with wrds python -c "import wrds; print('wrds package: OK')"
Also verify that WRDS credentials exist in the expected PostgreSQL password file:
~/.pgpass%APPDATA%/postgresql/pgpass.confIf any check fails, guide the user through setup:
uv pip install wrds.wrds-pgdata.wharton.upenn.edu:9737:wrds:USERNAME:PASSWORD
chmod 600 ~/.pgpassIf the user is working from a local checkout of this plugin and wants the helper script, it lives at:
plugins/wrds-data/skills/wrds-data/scripts/wrds_setup.py
Do NOT proceed with queries until --check passes.
If the user specifies a library/table, proceed directly. Otherwise, help them find it.
import wrds
db = wrds.Connection()
# List all accessible libraries
libs = db.list_libraries()
# List tables in a library
tables = db.list_tables(library='crsp')
# Describe a table (columns, types)
schema = db.describe_table(library='crsp', table='dsf')
# Approximate row count
count = db.get_row_count('crsp', 'dsf')
# Preview data
sample = db.get_table('crsp', 'dsf', rows=5)
For common datasets (CRSP, Compustat, IBES, etc.), consult the reference:
Use db.raw_sql() for all queries — it is the most flexible method.
data = db.raw_sql(
"SELECT permno, date, ret FROM crsp.dsf WHERE date >= '2020-01-01'",
date_cols=['date']
)
| Parameter | Usage |
|---|---|
date_cols | List of columns to parse as dates — always specify |
params | Dict for parameterized queries: %(name)s syntax |
chunksize | Process in chunks (default 500k rows); set None to disable |
return_iter | True to get an iterator for very large downloads |
dtype_backend | "pyarrow" for memory-efficient Arrow-backed DataFrames |
params = {'tickers': ('AAPL', 'MSFT'), 'start': '2023-01-01'}
data = db.raw_sql("""
SELECT a.permno, a.date, a.ret, b.ticker
FROM crsp.dsf a
JOIN crsp.stocknames b ON a.permno = b.permno
AND a.date >= b.namedt AND a.date <= b.nameendt
WHERE b.ticker IN %(tickers)s
AND a.date >= %(start)s
""", params=params, date_cols=['date'])
For datasets exceeding ~1M rows, use chunked iteration:
chunks = db.raw_sql(
"SELECT * FROM crsp.dsf WHERE date >= '2000-01-01'",
chunksize=500000, return_iter=True
)
for i, chunk in enumerate(chunks):
chunk.to_parquet(f'data/crsp_dsf_{i}.parquet')
Or download in one shot and save:
data = db.raw_sql("...", date_cols=['date'])
data.to_parquet('data/output.parquet')
# or
data.to_csv('data/output.csv', index=False)
Default conventions:
Data/ directory in the project root (create if needed)crsp_daily_2020_2023.parquet, compustat_annual.csvAlways close the connection when done:
db.close()
NotSubscribedError — user lacks subscription to this library. Inform them to request access via their institution's WRDS coordinator.SchemaNotFoundError — library name is wrong. Use db.list_libraries() to find the correct name.raw_sql().wrds package connects to wrds-pgdata.wharton.upenn.edu:9737 over SSL.~/.pgpass are never exposed to the LLM — the wrds package reads them directly.wrds_setup.py — Check environment and interactively configure WRDS credentialscommon_datasets.md — Common WRDS libraries, tables, query recipes, and variable glossary