Query the ClinicalTrials.gov REST API v2 for clinical study data. Covers searching trials by condition, intervention, sponsor, location, and status; retrieving full study details by NCT ID; filtering by phase, study type, and date ranges. Trigger for any mention of clinical trials, ClinicalTrials.gov, NCT numbers, trial status, recruiting studies, clinical study data, trial phases, trial sponsors, interventional studies, observational studies, trial enrollment, eligibility criteria, trial results, or drug pipeline research. Also trigger when the user needs to find active trials for a condition, identify a sponsor's trial portfolio, look up a specific NCT study, compare trial designs, assess competitive pipelines, or support regulatory review with trial data. This skill serves scientists, reviewers, program staff, and anyone who works with clinical research data at FDA or in the broader biomedical community.
The ClinicalTrials.gov API v2 (https://clinicaltrials.gov/api/v2/) provides free, no-auth access to data on 575,000+ clinical studies registered worldwide. No API key, no registration, no auth headers. Just HTTP GET with JSON responses.
Base URL: https://clinicaltrials.gov/api/v2/
API Docs: https://clinicaltrials.gov/data-api/api
Web UI: https://clinicaltrials.gov
What this data is: Registration and results data for clinical studies conducted around the world. Includes study design, eligibility criteria, interventions, outcomes, sponsor information, recruitment status, enrollment, locations, and (for completed studies) results summaries. Data is submitted by study sponsors and investigators as required by FDAAA 801 and the Final Rule (42 CFR Part 11).
What this data is NOT: This is not the raw study data or patient-level results. It is the structured registration record. Full study results (publications, datasets) are linked but not hosted by the API. Not all studies have results posted; results reporting is required for applicable clinical trials of FDA-regulated products.
No API key required. No officially published rate limit, but the site asks that automated queries not exceed the frequency of a human user. In practice, add 0.5s delays between requests in batch operations. Excessive automated traffic may be throttled or blocked.
import urllib.request, urllib.parse, json
BASE = "https://clinicaltrials.gov/api/v2"
def ctgov_query(endpoint, params=None):
"""Query ClinicalTrials.gov API v2. Returns parsed JSON."""
url = f"{BASE}/{endpoint}"
if params:
# Remove None values
params = {k: v for k, v in params.items() if v is not None}
url += "?" + urllib.parse.urlencode(params, safe=',|')
req = urllib.request.Request(url, headers={"Accept": "application/json"})
with urllib.request.urlopen(req, timeout=20) as resp:
return json.loads(resp.read().decode())
Endpoint: GET /studies
Search across all registered studies using query parameters and filters.
Query parameters (what to search for):
| Parameter | Description | Example |
|---|---|---|
query.cond | Condition or disease | query.cond=lung+cancer |
query.intr | Intervention or treatment | query.intr=pembrolizumab |
query.term | General search term (any field) | query.term=remdesivir+COVID |
query.spons | Sponsor or collaborator | query.spons=Pfizer |
query.locn | Location (country, state, city, or facility) | query.locn=Maryland |
query.id | NCT number(s), comma-separated | query.id=NCT04280705,NCT05684276 |
Multiple query params can be combined (AND logic between different param types).
Filter parameters:
| Parameter | Description | Values |
|---|---|---|
filter.overallStatus | Recruitment status | RECRUITING, NOT_YET_RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, ENROLLING_BY_INVITATION, SUSPENDED, TERMINATED, WITHDRAWN, AVAILABLE, NO_LONGER_AVAILABLE, TEMPORARILY_NOT_AVAILABLE, APPROVED_FOR_MARKETING, WITHHELD, UNKNOWN |
filter.ids | Specific NCT IDs | Pipe-delimited: NCT04280705|NCT05684276 |
filter.geo | Geographic proximity | distance(lat,lng,radius) e.g., distance(38.9,-77.0,50mi) |
filter.advanced | Advanced field-level filter using AREA[] syntax | See Advanced Filtering below |
Multiple status values can be pipe-delimited: filter.overallStatus=RECRUITING|NOT_YET_RECRUITING
Pagination and output:
| Parameter | Description | Default |
|---|---|---|
pageSize | Results per page | 10 (max 1000) |
pageToken | Cursor for next page (from previous response) | None |
countTotal | Include total count in response | false (set to true to get totalCount) |
format | Response format | json (also supports csv) |
sort | Sort order | LastUpdatePostDate:desc, StudyFirstPostDate:asc, @relevance |
fields | Specific fields to return (comma-separated) | All fields (see Field Selection) |
def search_studies(condition=None, intervention=None, sponsor=None,
term=None, location=None, status=None,
advanced_filter=None, geo=None,
page_size=10, count_total=True, sort=None, fields=None):
"""Search ClinicalTrials.gov studies."""
params = {
"format": "json",
"pageSize": page_size,
}
if count_total:
params["countTotal"] = "true"
if condition:
params["query.cond"] = condition
if intervention:
params["query.intr"] = intervention
if sponsor:
params["query.spons"] = sponsor
if term:
params["query.term"] = term
if location:
params["query.locn"] = location
if status:
params["filter.overallStatus"] = status
if advanced_filter:
params["filter.advanced"] = advanced_filter
if geo:
params["filter.geo"] = geo
if sort:
params["sort"] = sort
if fields:
params["fields"] = fields
return ctgov_query("studies", params)
Endpoint: GET /studies/{nctId}
Retrieve complete data for a specific study by NCT number.
def get_study(nct_id, fields=None):
"""Get full study details by NCT ID."""
params = {"format": "json"}
if fields:
params["fields"] = fields
return ctgov_query(f"studies/{nct_id}", params)
Endpoint: GET /version
Returns the API version and data timestamp (when the data was last updated).
def get_version():
return ctgov_query("version")
# Returns: {"apiVersion": "2.0.5", "dataTimestamp": "2026-03-12T10:00:04"}
Endpoint: GET /stats/size
Returns total study count, average size, and size distribution.
def get_stats():
return ctgov_query("stats/size")
# Returns: {"totalStudies": 575781, ...}
The filter.advanced parameter uses AREA[FieldName]Value syntax to filter on any study field. URL-encode the brackets.
# Filter by study type
filter_advanced = "AREA[StudyType]INTERVENTIONAL"
# Filter by phase
filter_advanced = "AREA[Phase]PHASE3"
# Filter by date range
filter_advanced = "AREA[StartDate]RANGE[2024-01-01,2024-12-31]"
# Combine multiple advanced filters with AND
filter_advanced = "AREA[StudyType]INTERVENTIONAL AND AREA[Phase]PHASE3"
# Filter by sponsor type
filter_advanced = "AREA[LeadSponsorClass]INDUSTRY"
Common AREA field names:
| Field | Values | Description |
|---|---|---|
StudyType | INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS | Type of study |
Phase | EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4, NA | Trial phase |
LeadSponsorClass | NIH, FED (other federal), INDUSTRY, NETWORK, OTHER | Sponsor category |
StartDate | RANGE[YYYY-MM-DD,YYYY-MM-DD] | Study start date range |
CompletionDate | RANGE[YYYY-MM-DD,YYYY-MM-DD] | Study completion date range |
ResultsFirstPostDate | RANGE[YYYY-MM-DD,YYYY-MM-DD] | Results posting date range |
Sex | MALE, FEMALE, ALL | Eligible sex |
AgeRange | CHILD, ADULT, OLDER_ADULT | Eligible age group |
StudyFirstPostDate | RANGE[YYYY-MM-DD,YYYY-MM-DD] | Registration date range |
{
"totalCount": 2279,
"nextPageToken": "ZVNj7o2Elu8o3lpoUti54...",
"studies": [
{
"protocolSection": {
"identificationModule": { "nctId": "...", "briefTitle": "...", "officialTitle": "..." },
"statusModule": { "overallStatus": "RECRUITING", "startDateStruct": {...}, ... },
"sponsorCollaboratorsModule": { "leadSponsor": { "name": "...", "class": "INDUSTRY" }, ... },
"descriptionModule": { "briefSummary": "...", "detailedDescription": "..." },
"conditionsModule": { "conditions": ["..."], "keywords": ["..."] },
"designModule": { "studyType": "INTERVENTIONAL", "phases": ["PHASE3"], "enrollmentInfo": {...}, ... },
"armsInterventionsModule": { "armGroups": [...], "interventions": [...] },
"outcomesModule": { "primaryOutcomes": [...], "secondaryOutcomes": [...] },
"eligibilityModule": { "eligibilityCriteria": "...", "sex": "ALL", "minimumAge": "18 Years", ... },
"contactsLocationsModule": { "locations": [...] },
"referencesModule": { "references": [...] }
},
"hasResults": false
}
]
}
Note: totalCount only appears when countTotal=true is set. Without it, you only get studies and nextPageToken.
| Module | Key Fields |
|---|---|
identificationModule | nctId, briefTitle, officialTitle, organization |
statusModule | overallStatus, startDateStruct, completionDateStruct, lastUpdateSubmitDate |
sponsorCollaboratorsModule | leadSponsor (name, class), collaborators |
descriptionModule | briefSummary, detailedDescription |
conditionsModule | conditions (array), keywords |
designModule | studyType, phases (array), enrollmentInfo (count, type), designInfo |
armsInterventionsModule | armGroups (label, type, description), interventions (name, type, description) |
outcomesModule | primaryOutcomes, secondaryOutcomes (measure, description, timeFrame) |
eligibilityModule | eligibilityCriteria (full text), sex, minimumAge, maximumAge, healthyVolunteers |
contactsLocationsModule | locations (facility, city, state, country, status) |
referencesModule | references (pmid, citation, type) |
Use the fields parameter to request only specific fields, reducing payload size:
# Minimal fields for a summary listing
fields = "NCTId,BriefTitle,OverallStatus,LeadSponsorName,Phase"
# Fields for sponsor portfolio analysis
fields = "NCTId,BriefTitle,OverallStatus,Phase,Condition,InterventionName,EnrollmentCount,StartDate"
Note: Field names in the fields parameter use CamelCase (e.g., NCTId, BriefTitle), NOT the JSON path names (e.g., not protocolSection.identificationModule.nctId). The response still uses the nested JSON structure.
The API uses cursor-based pagination via nextPageToken:
import time
def paginate_studies(max_results=100, **search_kwargs):
"""Paginate through search results."""
all_studies = []
page_token = None
while len(all_studies) < max_results:
params = {**search_kwargs}
if page_token:
params["page_token"] = page_token
r = search_studies(**params, count_total=(page_token is None))
studies = r.get("studies", [])
if not studies:
break
all_studies.extend(studies)
page_token = r.get("nextPageToken")
if not page_token:
break
time.sleep(0.5)
return all_studies[:max_results]
def find_recruiting_trials(condition, page_size=10):
"""Find currently recruiting trials for a condition."""
return search_studies(
condition=condition,
status="RECRUITING",
page_size=page_size,
sort="LastUpdatePostDate:desc")
def sponsor_pipeline(sponsor_name, page_size=20):
"""Get a sponsor's trial portfolio by phase and status."""
return search_studies(
sponsor=sponsor_name,
status="RECRUITING|NOT_YET_RECRUITING|ACTIVE_NOT_RECRUITING",
page_size=page_size,
sort="LastUpdatePostDate:desc")
def trials_near_location(condition, lat, lng, radius_miles=50):
"""Find recruiting trials near a geographic point."""
return search_studies(
condition=condition,
status="RECRUITING",
geo=f"distance({lat},{lng},{radius_miles}mi)",
page_size=20)
def study_detail_summary(nct_id):
"""Get a structured summary of a study."""
study = get_study(nct_id)
proto = study.get("protocolSection", {})
ident = proto.get("identificationModule", {})
status = proto.get("statusModule", {})
design = proto.get("designModule", {})
sponsor = proto.get("sponsorCollaboratorsModule", {}).get("leadSponsor", {})
arms = proto.get("armsInterventionsModule", {})
outcomes = proto.get("outcomesModule", {})
elig = proto.get("eligibilityModule", {})
return {
"nct_id": ident.get("nctId"),
"title": ident.get("briefTitle"),
"official_title": ident.get("officialTitle"),
"status": status.get("overallStatus"),
"phases": design.get("phases", []),
"study_type": design.get("studyType"),
"enrollment": design.get("enrollmentInfo", {}),
"sponsor": sponsor.get("name"),
"sponsor_class": sponsor.get("class"),
"conditions": proto.get("conditionsModule", {}).get("conditions", []),
"interventions": [{"name": i.get("name"), "type": i.get("type")}
for i in arms.get("interventions", [])],
"arms": [{"label": a.get("label"), "type": a.get("type")}
for a in arms.get("armGroups", [])],
"primary_outcomes": [o.get("measure") for o in outcomes.get("primaryOutcomes", [])],
"eligibility": {
"min_age": elig.get("minimumAge"),
"max_age": elig.get("maximumAge"),
"sex": elig.get("sex"),
"healthy_volunteers": elig.get("healthyVolunteers")
},
"has_results": study.get("hasResults", False)
}
import time
def compare_conditions(conditions, status="RECRUITING"):
"""Compare recruiting trial counts across conditions."""
results = {}
for cond in conditions:
r = search_studies(condition=cond, status=status, page_size=1, count_total=True)
results[cond] = r.get("totalCount", 0)
time.sleep(0.5)
return results
def sponsor_class_breakdown(condition):
"""Compare industry vs NIH vs other sponsorship for a condition."""
results = {}
for cls in ["INDUSTRY", "NIH", "OTHER"]:
r = search_studies(
condition=condition,
advanced_filter=f"AREA[LeadSponsorClass]{cls}",
page_size=1, count_total=True)
results[cls] = r.get("totalCount", 0)
time.sleep(0.5)
return results
The totalCount field only appears in the response when countTotal=true is included in the request. Without it, you get studies and a pagination token but no count.
Study type, phase, sponsor class, dates, and other field-level filters must go through filter.advanced using AREA[FieldName]Value syntax. Only filter.overallStatus, filter.geo, and filter.ids are dedicated top-level filter params.
There is no skip/offset parameter. You must use nextPageToken from each response to get the next page. Tokens are opaque strings that encode the cursor position.
In the fields parameter, use CamelCase names (e.g., NCTId, BriefTitle). In the JSON response, data is nested under protocolSection.identificationModule.nctId. This is a common source of confusion.
There is no authentication, but the API is a shared resource. Add time.sleep(0.5) between requests in batch operations. Excessive automated queries may result in IP-based throttling.
When constructing URLs manually, encode [ as %5B and ] as %5D in filter.advanced values. Python's urllib.parse.urlencode handles this automatically.
Only a fraction of studies have posted results. hasResults: true indicates results are available. Completed studies of FDA-regulated products are required to post results, but compliance is not universal.
The API rejects invalid status values with a clear error message. Use the exact enum values listed in the filter parameters table above.
When using format=csv, the output uses human-readable column headers ("NCT Number", "Study Title", "Study Status") not CamelCase field names. The fields parameter does NOT work with CSV format and will return a 400 error. Use CSV without a fields param to get the default column set.
The sort parameter only accepts a few field names. LastUpdatePostDate and StudyFirstPostDate are confirmed to work (with :asc or :desc). @relevance works for keyword searches. Other field names (e.g., NumericId, EnrollmentCount) return "Unknown sort field" errors. When in doubt, omit sort and accept the default ordering.
RECRUITING - Actively enrolling participants
NOT_YET_RECRUITING - Approved but not yet enrolling
ACTIVE_NOT_RECRUITING - Ongoing but no longer enrolling
COMPLETED - Study finished
ENROLLING_BY_INVITATION - Enrolling by invitation only
SUSPENDED - Temporarily halted
TERMINATED - Stopped early
WITHDRAWN - Withdrawn before enrollment
AVAILABLE - Expanded access available
UNKNOWN - Status not verified in 2+ years
EARLY_PHASE1 - Early Phase 1 (formerly Phase 0)
PHASE1 - Phase 1
PHASE2 - Phase 2
PHASE3 - Phase 3
PHASE4 - Phase 4 (post-marketing)
NA - Not Applicable (non-drug studies)
INTERVENTIONAL - Tests an intervention (drug, device, procedure)
OBSERVATIONAL - Observes outcomes without assigning interventions
EXPANDED_ACCESS - Treatment use outside of clinical trials
NIH - National Institutes of Health
FED - Other federal agency (FDA, CDC, DoD, VA, etc.)
INDUSTRY - Pharmaceutical/biotech/device company
NETWORK - Cooperative group or network
OTHER - Academic, hospital, individual, other
| Error | Cause | Fix |
|---|---|---|
| "is unknown parameter" | Invalid parameter name | Check spelling; use filter.advanced with AREA[] for field-level filters, not filter.studyType or filter.phase |
| "Invalid value in parameter overallStatus" | Bad enum value | Use exact values from enum list (e.g., RECRUITING not Recruiting) |
No totalCount in response | Forgot countTotal=true | Always include countTotal=true when you need the count |
| Empty studies array | No matches | Broaden search terms or check status filter |
| Timeout on large queries | Response too large | Reduce pageSize, add fields param to limit payload |
| 429 / throttled | Too many requests | Add delays between requests; no official rate limit published |
| "contains invalid CSV column name" | Used fields param with format=csv | Remove fields param; CSV uses its own default column set |
| "Unknown sort field" | Invalid sort field name | Use LastUpdatePostDate, StudyFirstPostDate, or @relevance only |