Reviews Django migration files for PostgreSQL best practices specific to Prowler. Trigger: When creating migrations, running makemigrations/pgmakemigrations, reviewing migration PRs, adding indexes or constraints to database tables, modifying existing migration files, or writing data backfill migrations. Always use this skill when you see AddIndex, CreateModel, AddConstraint, RunPython, bulk_create, bulk_update, or backfill operations in migration files.
makemigrations or pgmakemigrationsA bad migration can lock a production table for minutes, block all reads/writes, or silently skip index creation on partitioned tables.
makemigrations and pgmakemigrations bundle everything into one file: CreateModel, AddIndex, AddConstraint, sometimes across multiple tables. This is the default Django behavior and it violates every rule below.
After generating a migration, ALWAYS review it and split it:
CreateModelAddConstraintAddIndex per table → one migration per tableAddIndex on partitioned tables → two migrations (partition + parent)AlterField, AddField, RemoveField for each table → one migration per tableWhen adding fields or indexes to an existing model, makemigrations may also bundle AddIndex for unrelated tables that had pending model changes. Always check for stowaways from other tables.
CreateModel + AddConstraint = same migration (structural).
AddIndex = separate migration file (performance).
Django runs each migration inside a transaction (unless atomic = False). If an index operation fails, it rolls back everything, including the model creation. Splitting means a failed index doesn't prevent the table from existing. It also lets you --fake index migrations independently (see Rule 4).
# 0081_finding_group_daily_summary.py — DON'T DO THIS
class Migration(migrations.Migration):
operations = [
migrations.CreateModel(name="FindingGroupDailySummary", ...),
migrations.AddIndex(model_name="findinggroupdailysummary", ...), # separate this
migrations.AddIndex(model_name="findinggroupdailysummary", ...), # separate this
migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # this is fine here
]
# 0081_create_finding_group_daily_summary.py
class Migration(migrations.Migration):
operations = [
migrations.CreateModel(name="FindingGroupDailySummary", ...),
# Constraints belong with the model — they define its integrity rules
migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # unique
migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # RLS
]
# 0082_finding_group_daily_summary_indexes.py
class Migration(migrations.Migration):
dependencies = [("api", "0081_create_finding_group_daily_summary")]
operations = [
migrations.AddIndex(model_name="findinggroupdailysummary", ...),
migrations.AddIndex(model_name="findinggroupdailysummary", ...),
migrations.AddIndex(model_name="findinggroupdailysummary", ...),
]
Flag any migration with both CreateModel and AddIndex in operations.
Each table's indexes must live in their own migration file. Never mix AddIndex for different model_name values in one migration.
If the index on table B fails, the rollback also drops the index on table A. The migration name gives no hint that it touches unrelated tables. You lose the ability to --fake one table's indexes without affecting the other.
# 0081_finding_group_daily_summary.py — DON'T DO THIS
class Migration(migrations.Migration):
operations = [
migrations.CreateModel(name="FindingGroupDailySummary", ...),
migrations.AddIndex(model_name="findinggroupdailysummary", ...), # table A
migrations.AddIndex(model_name="resource", ...), # table B!
migrations.AddIndex(model_name="resource", ...), # table B!
migrations.AddIndex(model_name="finding", ...), # table C!
]
# 0081_create_finding_group_daily_summary.py — model + constraints
# 0082_finding_group_daily_summary_indexes.py — only FindingGroupDailySummary indexes
# 0083_resource_trigram_indexes.py — only Resource indexes
# 0084_finding_check_index_partitions.py — only Finding partition indexes (step 1)
# 0085_finding_check_index_parent.py — only Finding parent index (step 2)
Name each migration file after the table it affects. A reviewer should know which table a migration touches without opening the file.
Flag any migration where AddIndex operations reference more than one model_name.
Tables findings and resource_finding_mappings are range-partitioned. Plain AddIndex only creates the index definition on the parent table. Postgres does NOT propagate it to existing partitions. New partitions inherit it, but all current data stays unindexed.
Use the helpers in api.db_utils.
# 0084_finding_check_index_partitions.py
from functools import partial
from django.db import migrations
from api.db_utils import create_index_on_partitions, drop_index_on_partitions
class Migration(migrations.Migration):
atomic = False # REQUIRED — CREATE INDEX CONCURRENTLY can't run inside a transaction
dependencies = [("api", "0083_resource_trigram_indexes")]
operations = [
migrations.RunPython(
partial(
create_index_on_partitions,
parent_table="findings",
index_name="find_tenant_check_ins_idx",
columns="tenant_id, check_id, inserted_at",
),
reverse_code=partial(
drop_index_on_partitions,
parent_table="findings",
index_name="find_tenant_check_ins_idx",
),
)
]
Key details:
atomic = False is mandatory. CREATE INDEX CONCURRENTLY cannot run inside a transaction.reverse_code using drop_index_on_partitions so rollbacks work.all_partitions=True, which creates indexes on every partition CONCURRENTLY (no locks). This is the safe default.all_partitions=False unless you understand the consequence: Step 2's AddIndex on the parent will create indexes on the skipped partitions with locks (not CONCURRENTLY), because PostgreSQL fills in missing partition indexes inline during parent index creation.# 0085_finding_check_index_parent.py
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [("api", "0084_finding_check_index_partitions")]
operations = [
migrations.AddIndex(
model_name="finding",
index=models.Index(
fields=["tenant_id", "check_id", "inserted_at"],
name="find_tenant_check_ins_idx",
),
),
]
This second migration tells Django "this index exists" so it doesn't try to recreate it. New partitions created after this point inherit the index definition from the parent.
| Partition migration | Parent migration |
|---|---|
0020_findings_new_performance_indexes_partitions.py | 0021_findings_new_performance_indexes_parent.py |
0024_findings_uid_index_partitions.py | 0025_findings_uid_index_parent.py |
0028_findings_check_index_partitions.py | 0029_findings_check_index_parent.py |
0036_rfm_tenant_finding_index_partitions.py | 0037_rfm_tenant_finding_index_parent.py |
Flag any plain AddIndex on finding or resourcefindingmapping without a preceding partition migration.
For huge tables (findings has millions of rows), even CREATE INDEX CONCURRENTLY can take minutes and consume significant I/O. In production, you may want to decouple the migration from the actual index creation.
Write the migration normally following the two-step pattern above.
Fake the migration so Django marks it as applied without executing it:
python manage.py migrate api 0084_finding_check_index_partitions --fake
python manage.py migrate api 0085_finding_check_index_parent --fake
psql or python manage.py dbshell --database admin:-- For each partition you care about:
CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_jan_find_tenant_check_ins_idx
ON findings_2026_jan USING BTREE (tenant_id, check_id, inserted_at);
CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_feb_find_tenant_check_ins_idx
ON findings_2026_feb USING BTREE (tenant_id, check_id, inserted_at);
-- Then register on the parent (this is fast, no data scan):
CREATE INDEX IF NOT EXISTS find_tenant_check_ins_idx
ON findings USING BTREE (tenant_id, check_id, inserted_at);
SELECT indexrelid::regclass, indrelid::regclass
FROM pg_index
WHERE indexrelid::regclass::text LIKE '%find_tenant_check_ins%';
This is optional. For smaller tables or non-production environments, letting the migration run normally is fine.
Data backfills (updating existing rows, populating new columns, generating summary data) are the most dangerous migrations. A naive Model.objects.all().update(...) on a multi-million row table will hold a transaction lock for minutes, blow out WAL, and potentially OOM the worker.
The migration should only dispatch the work. The actual backfill runs asynchronously via Celery tasks, outside the migration transaction.
# 0090_backfill_finding_group_summaries.py
from django.db import migrations
def trigger_backfill(apps, schema_editor):
from tasks.jobs.backfill import backfill_finding_group_summaries_task
Tenant = apps.get_model("api", "Tenant")
from api.db_router import MainRouter
tenant_ids = Tenant.objects.using(MainRouter.admin_db).values_list("id", flat=True)
for tenant_id in tenant_ids:
backfill_finding_group_summaries_task.delay(tenant_id=str(tenant_id))
class Migration(migrations.Migration):
dependencies = [("api", "0089_previous_migration")]
operations = [
migrations.RunPython(trigger_backfill, migrations.RunPython.noop),
]
The migration finishes in seconds. The backfill runs in the background per-tenant.
Single-statement bulk updates on small result sets are OK inline:
# Fine — single UPDATE, small result set, no iteration
def backfill_graph_data_ready(apps, schema_editor):
AttackPathsScan = apps.get_model("api", "AttackPathsScan")
AttackPathsScan.objects.using(MainRouter.admin_db).filter(
state="completed", graph_data_ready=False,
).update(graph_data_ready=True)
Use inline only when you're confident the affected row count is small (< ~10K rows).
The actual backfill task must process data in batches. Use the helpers in api.db_utils:
from api.db_utils import create_objects_in_batches, update_objects_in_batches, batch_delete
# Creating objects in batches (500 per transaction)
create_objects_in_batches(tenant_id, ScanCategorySummary, summaries, batch_size=500)
# Updating objects in batches
update_objects_in_batches(tenant_id, Finding, findings, fields=["status"], batch_size=500)
# Deleting in batches
batch_delete(tenant_id, queryset, batch_size=settings.DJANGO_DELETION_BATCH_SIZE)
Each batch runs in its own rls_transaction() so:
One RLS transaction per batch. Never wrap the entire backfill in a single transaction. Each batch gets its own rls_transaction(tenant_id).
Use bulk_create / bulk_update with explicit batch_size. Never .save() in a loop. The default batch_size is 500.
Use .iterator() for reads. When reading source data, use queryset.iterator() to avoid loading the entire result set into memory.
Use .only() / .values_list() for reads. Fetch only the columns you need, not full model instances.
Catch and skip per-item failures. Don't let one bad row kill the entire backfill. Log the error, count it, continue.
scans_processed = 0
scans_skipped = 0
for scan_id in scan_ids:
try:
result = process_scan(tenant_id, scan_id)
scans_processed += 1
except Exception:
logger.warning("Failed to process scan %s", scan_id)
scans_skipped += 1
logger.info("Backfill done: %d processed, %d skipped", scans_processed, scans_skipped)
Log totals at start and end, not per-batch. Per-batch logging floods the logs. Log the total count at the start, and the processed/skipped counts at the end.
Use ignore_conflicts=True for idempotent creates. Makes the backfill safe to re-run if interrupted.
Model.objects.bulk_create(objects, batch_size=500, ignore_conflicts=True)
| Migration | Task |
|---|---|
0062_backfill_daily_severity_summaries.py | backfill_daily_severity_summaries_task |
0080_backfill_attack_paths_graph_data_ready.py | Inline (trivial update) |
0082_backfill_finding_group_summaries.py | backfill_finding_group_summaries_task |
Task implementations: tasks/jobs/backfill.py
Batch utilities: api/db_utils.py (batch_delete, create_objects_in_batches, update_objects_in_batches)
Auto-generated migration?
├── Yes → Split it following the rules below
└── No → Review it against the rules below
New model?
├── Yes → CreateModel + AddConstraint in one migration
│ AddIndex in separate migration(s), one per table
└── No, just indexes?
│ ├── Regular table → AddIndex in its own migration
│ └── Partitioned table (findings, resource_finding_mappings)?
│ ├── Step 1: RunPython + create_index_on_partitions (atomic=False)
│ └── Step 2: AddIndex on parent (separate migration)
│ └── Large table? → Consider --fake + manual apply
└── Data backfill?
├── Trivial update (< ~10K rows)? → Inline RunPython is OK
└── Large backfill? → Migration dispatches Celery task(s)
├── One task per tenant
├── Batch processing (bulk_create/bulk_update, batch_size=500)
├── One rls_transaction per batch
└── Catch + skip per-item failures, log totals
| Scenario | Approach |
|---|---|
| Auto-generated migration | Split by concern and table before committing |
| New model + constraints/RLS | Same migration (constraints are structural) |
| Indexes on a regular table | Separate migration, one table per file |
| Indexes on a partitioned table | Two migrations: partitions first (RunPython + atomic=False), then parent (AddIndex) |
| Index on a huge partitioned table | Same two migrations, but fake + apply manually in production |
| Trivial data backfill (< ~10K rows) | Inline RunPython with single .update() call |
| Large data backfill | Migration dispatches Celery task per tenant, task batches with rls_transaction |
If migration passes all checks, say so.
Prerequisite: Install Context7 MCP server for up-to-date documentation lookup.
When implementing or debugging migration patterns, query these libraries via mcp_context7_query-docs:
| Library | Context7 ID | Use for |
|---|---|---|
| Django 5.1 | /websites/djangoproject_en_5_1 | Migration operations, indexes, constraints, SchemaEditor |
| PostgreSQL | /websites/postgresql_org_docs_current | CREATE INDEX CONCURRENTLY, partitioned tables, pg_inherits |
| django-postgres-extra | /SectorLabs/django-postgres-extra | Partitioned models, PostgresPartitionedModel, partition management |
Example queries:
mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="migration operations AddIndex RunPython atomic")
mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="database indexes Meta class concurrently")
mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="CREATE INDEX CONCURRENTLY partitioned table")
mcp_context7_query-docs(libraryId="/SectorLabs/django-postgres-extra", query="partitioned model range partition index")
Note: Use
mcp_context7_resolve-library-idfirst if you need to find the correct library ID.
# Generate migrations (ALWAYS review output before committing)
python manage.py makemigrations
python manage.py pgmakemigrations
# Apply migrations
python manage.py migrate
# Fake a migration (mark as applied without running)
python manage.py migrate api <migration_name> --fake
# Manage partitions
python manage.py pgpartition --using admin
api/src/backend/api/db_utils.py (create_index_on_partitions, drop_index_on_partitions)api/src/backend/api/partitions.pyapi/src/backend/api/rls.py0028 + 0029, 0024 + 0025, 0036 + 0037