use to query historical data on the IMSA Weathertech seasons
Analyze IMSA racing data from the DuckDB database providing insights into lap times, driver performance, team comparisons, weather impacts, and race strategies.
Query with the skill included ./query.sh "SELECT 1". Schema can be found in ./schema.md
Output Formats:
-markdown flag)./query.sh --csv "SELECT ..." for token-efficient output with large result setsYou may have to use --remote parameter to access the database, and that may require the use of INSTALL httpfs if its not already there. You will figure it out.
WHERE event = 'X' AND session = 'race' → get session_idWHERE session_id = X AND class = 'Y' AND flags = 'GF' AND lap_time_driver_quartile IN (1, 2)session = 'race' and top 50% of laps unless asked otherwiseAlways compare within a single session_id. A session_id uniquely identifies one specific session (e.g., "2025 Sebring Race"). Laps from different sessions should NOT be compared directly.
✅ DEFAULT: Filter by session_id (captures year, event, session, start_date)
❌ AVOID: Comparing across multiple session_ids without explicit reason
It's a good idea to start with querying the seasons table at the beginning. Example:
./query "SELECT * FROM seasons WHERE session = 'race' AND season in (2024,2025) ORDER BY date"
Default to session = 'race' unless specifically asked otherwise. Practice and qualifying have different objectives, tire strategies, and fuel loads.
✅ ALWAYS: Start with race sessions
⚠️ ONLY IF ASKED: Look at practice or qualifying data
GTP ≠ LMP2 ≠ GTD even in the same session. Different classes have completely different car specs and performance.
❌ NEVER: Compare GTP times to GTD times
✅ ALWAYS: Analyze each class separately within the session
When calculating average lap times, ALWAYS filter to a single session_id AND class. Averaging across sessions or classes produces meaningless numbers.
The Golden Filtering Rule:
WHERE session_id = X -- Single session
AND class = 'Y' -- Single class
AND bpillar_quartile IN (1, 2) -- BPillar top 50% (race sessions only)
For performance analysis, use BPillar filtering - automatically excludes pit laps, first lap, slow laps, and traffic.
✅ ALWAYS: Filter to bpillar_quartile IN (1, 2) for pace analysis in races
✅ ALTERNATIVE: Use lap_time_driver_quartile IN (1, 2) for non-race sessions
❌ AVOID: Including quartiles 3 and 4 when analyzing true pace
The bpillar_quartile column (race sessions only) intelligently filters laps:
By filtering to bpillar_quartile IN (1, 2), you get only clean, representative racing pace.
While lap times aren't comparable across sessions, these analyses ARE valid:
⚠️ Still not valid: Averaging lap times across different sessions, even with quartile filtering. Each session must be analyzed independently first.
Example valid cross-session query:
-- Races participated by driver (not comparing lap times!)
SELECT
driver_name,
COUNT(DISTINCT session_id) as race_count,
COUNT(DISTINCT event) as unique_venues,
STRING_AGG(DISTINCT event, ', ' ORDER BY event) as events_raced
FROM laps
WHERE year = '2025'
AND session = 'race'
GROUP BY driver_name
ORDER BY race_count DESC;
driver_id (VARCHAR) for stable identification across name variants-- First, identify the session_id you want to analyze
SELECT
session_id,
year,
event,
session,
start_date,
COUNT(*) as total_laps,
COUNT(DISTINCT class) as classes
FROM laps
WHERE year = '2025'
AND event = 'Sebring'
AND session = 'race' -- almost always race
GROUP BY session_id, year, event, session, start_date
ORDER BY start_date;
-- driver_id is a string like 'firstname lastname' - look up by name first
SELECT DISTINCT driver_id, driver_name
FROM laps
WHERE driver_name LIKE '%Beche%' -- partial match
ORDER BY driver_name;
Always use this macro for human-readable times:
CREATE OR REPLACE MACRO format_time(t) AS (
CASE
WHEN t > 3600 THEN STRFTIME('%H:%M:%S', MAKE_TIMESTAMP(CAST(t * 1000000 AS BIGINT))) || '.' || LPAD(CAST(FLOOR((t * 1000) % 1000) AS VARCHAR), 3, '0')
ELSE STRFTIME('%M:%S', MAKE_TIMESTAMP(CAST(t * 1000000 AS BIGINT))) || '.' || LPAD(CAST(FLOOR((t * 1000) % 1000) AS VARCHAR), 3, '0')
END
);
-- Get fastest laps per class in a specific race session
SELECT
driver_name,
team_name,
car,
class,
format_time(lap_time) AS lap_time,
lap AS lap_number
FROM laps
WHERE session_id = 12345 -- ← Use the session_id from query above
AND class = 'GTP' -- ← Analyze each class separately
AND bpillar_quartile IN (1, 2) -- BPillar top 50% (auto-excludes pit/slow laps)
ORDER BY lap_time ASC
LIMIT 10;
-- Compare drivers within a single race session using their best laps
SELECT
driver_name,
COUNT(*) AS total_laps,
format_time(MIN(lap_time)) AS fastest,
format_time(AVG(lap_time)) AS average,
format_time(STDDEV(lap_time)) AS std_dev,
ROUND(STDDEV(lap_time) / AVG(lap_time) * 100, 2) AS cv_percent
FROM laps
WHERE session_id = 12345 -- ← Single session only
AND class = 'GTP' -- ← Single class only
AND bpillar_quartile IN (1, 2) -- BPillar top 50% representative pace
GROUP BY driver_name
HAVING COUNT(*) >= 5 -- Minimum lap sample for bpillar top 50%
ORDER BY cv_percent ASC;
-- Analyze pit stops in a specific race
SELECT
driver_name,
team_name,
car,
lap,
format_time(pit_time) AS pit_duration,
session_time_lap_number
FROM laps
WHERE session_id = 12345 -- ← Single race session
AND pit_time IS NOT NULL
ORDER BY pit_time ASC
LIMIT 20;
-- Weather effects within a single race session and class
SELECT
CAST(track_temp_f / 10 AS INT) * 10 AS temp_bucket,
COUNT(*) AS laps,
format_time(AVG(lap_time)) AS avg_lap_time,
format_time(MIN(lap_time)) AS fastest_lap
FROM laps
WHERE session_id = 12345 -- ← Single race session
AND class = 'GTP' -- ← Single class
AND bpillar_quartile IN (1, 2) -- BPillar representative performance
AND track_temp_f IS NOT NULL
GROUP BY temp_bucket
ORDER BY temp_bucket;
-- Track tire degradation for a specific driver in a race
-- NOTE: Using ALL laps here to see full degradation curve
SELECT
driver_name,
stint_number,
stint_lap,
format_time(lap_time) AS lap_time,
lap_time_driver_quartile,
session_time_lap_number
FROM laps
WHERE session_id = 12345 -- ← Single race session
AND driver_id = 'tobi lutke' -- ← Use driver_id string (e.g., 'firstname lastname')
AND lap_time IS NOT NULL
AND flags = 'GF' -- Green flag only to exclude cautions
ORDER BY stint_number, stint_lap;
-- Alternative: Focus only on clean, representative laps
-- WHERE ... AND lap_time_driver_quartile IN (1, 2)
-- Compare teammates in a single race session using representative pace
WITH teammate_stats AS (
SELECT
driver_name,
team_name,
COUNT(*) AS laps,
MIN(lap_time) AS fastest,
AVG(lap_time) AS average
FROM laps
WHERE session_id = 12345 -- ← Single race session
AND team_name = 'Porsche Penske Motorsport'
AND bpillar_quartile IN (1, 2) -- BPillar top 50% pace
GROUP BY driver_name, team_name
)
SELECT
driver_name,
laps,
format_time(fastest) AS fastest_lap,
format_time(average) AS avg_lap,
format_time(average - (SELECT MIN(average) FROM teammate_stats)) AS gap_to_fastest,
ROUND((average - (SELECT MIN(average) FROM teammate_stats)), 3) AS gap_seconds
FROM teammate_stats
ORDER BY average;
session = 'race' unless specifically asked for practice/qualifyinglap_time_driver_quartile IN (1, 2) for representative paceWHERE session_id = X AND class = 'Y' AND lap_time_driver_quartile IN (1, 2)flags = 'GF' for clean lap comparisonsIS NOT NULL filtersflags = 'GF' to exclude caution lapslaps_2025 instead of WHERE year = '2025' if availablesession = 'race' unless explicitly asked otherwisebpillar_quartile IN (1, 2) for race pace analysisdriver_id = 'tobi lutke', not numeric IDs'01' ≠ '1' - use exact matchesdriver_id (VARCHAR) for joins/filters, driver_name for displaylap_time_driver_quartile for practice/qualifying✅ Have I identified the specific session_id?
✅ Have I specified a single class?
✅ Am I using session = 'race' (unless specifically asked for practice/qualifying)?
✅ Have I filtered to bpillar_quartile IN (1, 2) for race analysis?
✅ Am I only comparing lap times within these boundaries?
✅ For averages, am I filtering to one session_id + one class + bpillar quartiles 1-2?
SELECT session_id, start_date, COUNT(*) as laps
FROM laps
WHERE event = 'Sebring' AND year = '2025' AND session = 'race'
GROUP BY session_id, start_date;
SELECT DISTINCT class FROM laps WHERE session_id = X;
SELECT DISTINCT session_id, event, year, class
FROM laps
WHERE driver_id = 'tobi lutke' AND session = 'race' -- driver_id is VARCHAR
ORDER BY year, event;
flags = 'FCY' or similar)raining = TRUE) during the sessionformat_time(lap_time)ROUND(x, 2)AS fastest_lap, not AS fl"Who's fastest at [track]?" → Find the race session_id, then query fastest laps per class (use top 50% filter)
"Who won the [event] race?" → Valid question - find the race session_id, analyze by class
"Who's fastest overall in 2025?" → ❌ INVALID - lap times aren't comparable across different tracks
"What's [driver]'s average pace?"
→ Must specify session_id and class, filter to bpillar_quartile IN (1, 2)
"How did [driver] do in the race?" → Get all laps for driver_id at specific session_id, show pace relative to class using bpillar quartiles
"Compare [team A] vs [team B]"
→ Only valid within same session_id and same class, filter to bpillar_quartile IN (1, 2)
"What was the pit strategy at Sebring?" → Filter to the Sebring race session_id, show pit_time entries per class
"How does weather affect pace at Road America?" → Choose the Road America race session_id, group by temperature within each class (top laps only)
"Who's most consistent in GTP?" → Calculate CV for a specific race session_id, filtered to GTP class and top 50% laps
"Show me tire degradation for [driver]" → Use ALL laps for specific session_id to see full wear curve (exception to quartile rule)
"Best lap times for [driver]?"
→ Specify session_id + class, show their fastest laps using bpillar_quartile IN (1, 2)
session = 'race' unless specifically asked for practice/qualifyingbpillar_quartile IN (1, 2) for race pace analysisflags = 'GF' or pit lap exclusions'tobi lutke' for filtering/joins, driver_name for display--csv flag for token efficiency with large resultsRemember: The goal is actionable insights within the context of a specific race session. Always filter to session_id + class + top laps first, then analyze. Present findings clearly and suggest follow-up questions when appropriate.