Search, rank, and help download medical imaging datasets with a structured local index of 1100+ records from Project-Imaging-X. Use when the user asks for dataset discovery, open-only filtering, platform-specific search, or download guidance across modalities, anatomies, diseases, and tasks.
Use this skill for medical imaging dataset discovery and single-dataset download assistance.
Default to the local index first. Only widen to the web when the local index is clearly sparse.
Resolve scripts/... and references/... paths relative to this skill directory, not the user's working directory.
datasets.json: local dataset indexaccess_rules.json: URL pattern to access/download rule mappingscripts/search_datasets.py: deterministic filtering and rankingscripts/lookup_dataset.py: deterministic single-dataset lookup for follow-up download helpscripts/update_index.py: refresh the local indexreferences/query-normalization.md: map user language to search argumentsreferences/ranking-policy.md: strict-vs-near and ranking rulesExtract only the constraints the user actually stated:
If you need mapping help, read references/query-normalization.md.
Treat these as hard constraints unless the user phrased them as a preference such as prefer, ideally, or 最好.
Use scripts/search_datasets.py instead of manually filtering in prose.
Example:
python /path/to/project-imaging-x-discovery/scripts/search_datasets.py \
--modality mr \
--structure brain \
--disease glioma \
--task seg \
--dim 3d \
--label true \
--access open \
--prefer-open
The script returns:
interpreted_querystrict_match_countstrict_matchesnear_match_countnear_matchesStrict matches satisfy all hard constraints. Near matches are only for fallback.
Never mix them in one main table.
When scripts/search_datasets.py already returns ranked strict matches, preserve that order in the main strict table instead of replacing the first rows with hand-picked examples.
If the user asked for open, public, direct download, or 公开可下载:
access=open stays in the main strict result setregistration and application move to near matches unless the user explicitly accepted themWhen the user did not state an access requirement, rank by the policy in references/ranking-policy.md.
Do one targeted web search only if:
strict_match_count == 0Skip web search when the local index already answers the query well.
Mark any added items as web supplement. Do not pretend they came from the local index.
If the user asked for a rare modality and the local index has no real coverage for it, say that explicitly and avoid padding the answer with irrelevant near matches from other modalities.
Do not bulk download.
If the user has already selected a specific dataset, skip broad discovery and switch straight to single-dataset lookup. In that follow-up mode:
scripts/lookup_dataset.py before any manual prose filtering or broad searchstrict_matches must contain only the selected dataset when lookup is unambiguousnear_matches stays empty unless the lookup itself is ambiguousdownload_method and auth_instructions; do not replace them with a generic recipe from another platformFor access=open:
scripts/lookup_dataset.py to pull its exact metadata firstFor access=registration or application:
auth_instructionsscripts/lookup_dataset.py before describing access stepswget, curl, or gdown unless the user explicitly asks for alternatives after access is grantedFor access=unknown:
web-confirmed rather than pretending it came from the local index aloneunresolvedIf scripts/lookup_dataset.py returns multiple matches:
strict_matchesnear_matchesinterpreted_queryIf scripts/lookup_dataset.py returns resolution_mode=unresolved for a short or generic query:
strict_matches emptyUse this shape by default:
interpreted_query:
- modality
- structure
- disease
- task
- dim
- label
- access
- platform
strict_matches:
- markdown table
near_matches:
- markdown table
- each item includes why it is near rather than strict
download_next_step:
- what can be downloaded now
- what requires registration or application
- keep it concrete and short; do not add extra option menus unless the user asked for alternatives
Refresh the local index only when needed:
python /path/to/project-imaging-x-discovery/scripts/update_index.py --github