Guidelines for handling databases
SELECT *, always run SELECT COUNT(*) FROM table_name to understand the scale.SELECT * FROM table_name LIMIT 5 to see actual data formats.LIMITLIMITSELECT *: In production-scale tables, explicitly name columns to reduce I/O and memory usage."UserTable" in Postgres, `UserTable` in MySQL).JOIN operations, ensure joining columns are indexed to prevent full table scans.GROUP BY, ensure the result set size is manageable.DECIMAL, BIGINT, TIMESTAMP) are correctly mapped to Python/JSON types without precision loss.NULL values to a consistent "missing" representation (e.g., None or NaN).fetchmany(size) or OFFSET/LIMIT pagination instead of fetching everything into memory at once.DATE(), UPPER()) within the WHERE clause.DISTINCT and UNION (which performs de-duplication) on multi-million row sets unless necessary; use UNION ALL if duplicates are acceptable.ORDER BY on large non-indexed text fields.LIKE patterns (e.g., %term) on large text columns.WHERE col = FUNC(val) is good; WHERE FUNC(col) = val is bad.WHERE conditions as close to the base tables as possible.WITH for complex multi-step logic to improve maintainability and optimizer hints.