Research gap finder for longitudinal cohort databases. Profiles cohort strengths, matches PI expertise, scans literature saturation, and outputs ranked topic proposals with gap evidence. Works with any cohort: NHIS, UK Biobank, institutional EMR, health checkup registries, or disease-specific registries.
You are assisting a medical researcher in systematically discovering novel, publishable research topics from a cohort database. Your approach combines cohort variable profiling, PI expertise matching, literature saturation scanning, and multi-pattern gap scoring to produce ranked topic proposals with evidence of novelty.
This skill fills a gap that no existing tool addresses: DB variables -> literature gap -> research question. Existing tools (PICO, FINER, SciSpace, Elicit) work from literature to gaps. This skill works from the data outward.
${CLAUDE_SKILL_DIR}/references/ for templates and rubricsCollect cohort metadata. Use the template at ${CLAUDE_SKILL_DIR}/references/cohort_profile_template.md.
Required information:
If the user provides a data dictionary file (Excel/CSV), read it to extract variable categories and construct the variable cluster map automatically.
Gate: Present the cohort profile summary. Confirm before proceeding.
Profile the intended PI or corresponding author to find topic-expertise alignment.
/search-lit E-utilities: bash "$EUTILS" search "AuthorLastName AuthorFirstInitial[Author]" 30If no PI is specified, skip this phase and use variable clusters alone in Phase 2.
Output: PI profile card (name, affiliation, top keywords, society roles, preferred journals).
Cross cohort variable clusters with PI expertise to generate candidate topics.
Create a matrix: rows = DB variable clusters, columns = PI keyword clusters. Score each cell 0-3:
Before advancing candidates to saturation scanning, apply a discipline filter:
This filter prevents generating topics where the first author's contribution is not defensible at the variable level.
Gate: Present the intersection matrix and top 20 candidates (post-discipline filter). User selects 8-12 for saturation scanning.
For each selected candidate, determine how saturated the literature is.
For each candidate:
(exposure terms) AND (outcome terms) AND (cohort OR longitudinal OR prospective)/search-lit E-utilities.| Grade | Count | Longitudinal? | Interpretation |
|---|---|---|---|
| Blue Ocean | 0-2 papers | N/A | First report possible. Verify the topic has audience interest. |
| Green Field | 3-10 papers, all cross-sectional | No longitudinal | Optimal zone — established interest, longitudinal gap wide open. |
| Yellow | 10-30 papers | Some longitudinal | Viable only with very specific angle (unique population, novel endpoint). |
| Red | 30+ papers or MA exists | Yes | Avoid unless doing NMA or using truly unique data. |
For each candidate in Green/Yellow, ask: "Has anyone published this with serial/repeated measurements?" If no — automatic upgrade by one grade.
For each candidate, articulate 2-3 potential clinical implications of the findings. If you cannot state why a clinician or policymaker would care about the result, the topic fails regardless of gap score.
Output: Saturation table with grade, paper count, longitudinal gap status, and "So What" statement for each candidate.
Gate: Present saturation results. User selects 3-5 finalists for deep scoring.
Apply the 6-Pattern framework to each finalist. Score each pattern 0 or 1.
Read the detailed rubric at ${CLAUDE_SKILL_DIR}/references/pattern_scoring_rubric.md.
| # | Pattern | Question | Score 1 if... |
|---|---|---|---|
| P1 | Longitudinal Advantage | Does the cohort's serial/repeated measurement structure create a clear edge over existing cross-sectional studies? | Cohort has 3+ timepoints for key variables AND no prior study used serial data for this topic. |
| P2 | Endpoint Upgrade | Can we escalate to a harder endpoint than existing studies? | Cohort links to mortality/cancer/CVD registries AND existing studies stop at surrogate endpoints. |
| P3 | Cohort Uniqueness | Is the cohort's population, scale, or setting distinctive? | Largest in this population, unique ethnic group, screening-based (no referral bias), or novel linkage. |
| P4 | PI-Topic Alignment | Does the PI's expertise and reputation strengthen this topic? | PI has society role or 5+ papers directly in this domain. Skip if no PI specified. |
| P5 | Comparison Table Gaps | Does the THIS STUDY column show 3+ differences vs existing papers? | Build comparison table (see below). 3+ checkmarks in THIS STUDY that are absent in all prior papers. |
| P6 | Complementary Design | Can this topic pair with another study from the same cohort? | Two studies using the same DB but different populations or complementary variables (e.g., viral vs non-viral). |
For each finalist, build a table comparing the top 3-5 existing papers against THIS STUDY:
| Feature | Author1 (Year) | Author2 (Year) | Author3 (Year) | THIS STUDY |
|---------|----------------|----------------|----------------|------------|
| Design | Cross-sectional | Cohort (5yr) | Cross-sectional | Cohort (20yr) |
| N | 3,200 | 8,500 | 12,000 | ~200,000 |
| Serial data | No | No | No | Yes (avg 5 visits) |
| Hard endpoint | Surrogate | Surrogate | All-cause mortality | CVD + all-cause mortality |
| Population | Referral | General | Screening | Health checkup (no referral bias) |
| Ethnicity | Western | Western | Asian (Japan) | Asian (Korea) |
| Subgroup analysis | No | Age only | No | Age + sex + comorbidity |
| Total Score | Recommendation |
|---|---|
| 5-6 | Top-tier journal target (Lancet sub, JACC, J Hepatol level) |
| 3-4 | Specialty journal target (solid publication) |
| 1-2 | Restructure or kill — find a stronger angle before proceeding |
Gate: Present scoring results and comparison tables. User approves final ranking.
For each scored finalist, verify practical feasibility.
Sample size adequacy:
Missing data:
Follow-up adequacy:
Operational definition:
IRB/ethics:
Disease Novelty Bonus (informational, not Go/No-Go):
Output: Feasibility report for each finalist with Go/Conditional/No-Go status.
Generate the final deliverables.
| Rank | Topic (PICO) | Saturation | 6-Pattern Score | Feasibility | Target Journal | Timeline |
|------|--------------|------------|-----------------|-------------|----------------|----------|
| 1 | ... | Green (0 longitudinal) | 5/6 | Go | JACC | 6 months |
| 2 | ... | Green (1 longitudinal) | 4/6 | Go | Eur Heart J | 6 months |
| 3 | ... | Blue (0 papers) | 3/6 | Conditional | Radiology | 8 months |
Use the template at ${CLAUDE_SKILL_DIR}/references/onepager_template.md.
Each one-pager includes:
Save one-pagers as markdown files: {output_dir}/gap_proposal_{rank}_{short_topic}.md
| Phase | Calls to other skills |
|---|---|
| Phase 1 (PI profiling) | /search-lit E-utilities for PubMed author search |
| Phase 3 (Saturation scan) | /search-lit E-utilities for topic searches |
| Phase 4 (Comparison table) | /search-lit for retrieving paper metadata |
| Downstream | Output feeds into /design-study → /write-paper pipeline |
/analyze-stats)/write-paper)/design-study)/search-lit)/make-figures)/search-lit with confirmed DOI or PMID. Mark unverified references as [UNVERIFIED - NEEDS MANUAL CHECK].[VERIFY] and ask the user.