Bulk re-trigger of CONTRACT_EXECUTED public webhooks stopped mid-run (SoftTimeLimitExceeded / worker time limit), leaving a subset of contract_ids never notified. Covers django-rest-api management command re_trigger_contract_executed_webhook, ID-list on-call mitigation script, batching vs Celery soft limits, dry-run preview. Use when: retrigger executed webhook failed halfway, 467 of 529 done, pending contract ids list, Managerie timeout, contract executed webhook support bulk. Trigger phrases: SoftTimeLimitExceeded retrigger webhook, re_trigger_contract_executed_webhook timeout, remaining contract IDs webhook, CONTRACT_EXECUTED bulk retrigger.
Symptom class: A bulk operation to resend CONTRACT_EXECUTED events to public API webhooks runs for a large set of contracts but stops before finishing. Ops or support report partial success (e.g. “{completed} of {total} succeeded”) and a list of pending contract_ids.
Typical failure mode: The job runs inside a Celery task or worker-bound process with a soft time limit. Long loops (many contracts × per-contract work × time.sleep between batches) exceed SoftTimeLimitExceeded, so the process aborts. Contracts processed before the limit are fine; the tail never runs.
Not the same as: integration disabled (Tray/native), wrong workspace, or webhook URL misconfiguration — those usually fail all or zero for a slice, not a clean prefix completed + suffix pending after one run.
Use when:
SoftTimeLimitExceeded (or similar worker kill) during the runcontract_id list and need safe catch-upre_trigger_contract_executed_webhookDo not use for:
download_link 404 right after create → contract-lookup| Piece | Location | Role |
|---|---|---|
| Management command (date-range bulk) | public/management/commands/re_trigger_contract_executed_webhook.py | Selects contracts by {workspace_id} + executed_before_date (and optional executed-after-date), batches, time.sleep between batches, calls execute_send_contract_executed_event_use_case per contract |
| Webhook send use case | public/webhooks/domain/use_cases/contract_executed_webhook_use_case.py | execute_send_contract_executed_event_use_case — subscribed handler for DomainEventType.CONTRACT_EXECUTED |
| ID-list mitigation (on-call) | scripts/oncall_mitigations/re_trigger_contract_executed_webhook_for_pending_ids.py | Filters ContractV3 by id__in + optional contract_roles__workspace_id={workspace_id}; same payload shape as command; supports dry-run that simulates batch flow without calling the use case |
Payload parity with management command: Both build DomainEventPayload.from_current_context(ContractExecutedEventData(contract=ContractResponseV2.from_orm(contract)), user_id=contract.created_by_id, workspace_id=contract.creator_party.id) then call execute_send_contract_executed_event_use_case.
python manage.py re_trigger_contract_executed_webhook {workspace_id} {executed_before_date} [batch_size] [webhook_trigger_delay] ...scripts/oncall_mitigations/re_trigger_contract_executed_webhook_for_pending_ids.py — intended for only pending IDs after a failed bulk run.In manage.py shell (or SQL), for each {contract_id}:
ContractV3contract_roles include workspace_id = {workspace_id} if you enforce tenant guardCONTRACT_EXECUTED resend (product judgment — do not resend for wrong lifecycle)Run mitigation execute(dry_run=True) first. Expected behavior:
WOULD_TRIGGER / WOULD_FAIL per idexecute_send_contract_executed_event_use_case when dry runThen execute(dry_run=False) for real sends.
Run context: Use python manage.py shell so Django is configured; do not run the script as a bare python file without django.setup().
batch_size or increase delay only if rate limits matter; primary fix is smaller total work per worker invocation (chunk ID list across multiple runs) or raise task soft time limit for a dedicated maintenance task (platform change — not ad hoc).re_trigger_contract_executed_webhook with optional --contract-ids / file input so support does not rely on a one-off script (product engineering follow-up).{total} if {completed} already succeeded — risk duplicate webhook noise downstream.{pending_contract_ids} using the on-call script pattern or an equivalent small batch.dry_run=True first; then dry_run=False.workspace_id guard aligned with WORKSPACE_ID in script or pass workspace_id= into execute().contract_id + event id, confirm; if not, coordinate before mass retrigger.re_trigger_contract_executed_webhook and/or run bulk retrigger as chunked Celery tasks with per-task time limits.Slack: Incident thread may live at https://spotdraft.slack.com/archives/C0AQ293P40L (channel id C0AQ293P40L). This skill is generalized from bulk CONTRACT_EXECUTED retrigger + SoftTimeLimitExceeded + pending ID list pattern; replace placeholders with values from the thread, logs, and Metabase.