Interactive skill that guides extraction of research paradigms and methodological techniques from cognitive science papers into structured, reusable skills
An interactive skill for extracting research paradigms and methodological techniques from cognitive science and neuroscience papers. The output is a well-structured skill conforming to this project's SKILL.md format.
Focus: Strict extraction of reproducible methods — experimental designs, data acquisition parameters, processing pipelines, analysis procedures, and stimulus specifications. This is NOT about summarizing a paper's novelty or theoretical contributions.
Trigger Conditions
Activate this skill when the user:
Provides a paper (PDF path, file, or pasted text) and asks to extract research skills/methods
Uses phrases like "extract skills from this paper", "turn this paper into a skill", "what methods can I reuse from this paper"
Research Planning Protocol
Before extracting skills from a paper, you MUST:
Clarify the extraction goal — What type of methodological knowledge is the user looking for?
Related Skills
Justify the source — Is this paper a suitable source (empirical, methods, review)? What type-specific extraction strategy applies?
Declare expected outputs — What kind of skill(s) do you expect to generate (paradigm design, analysis pipeline, modeling)?
Note limitations — Are there missing parameters, ambiguous descriptions, or domain gaps in this paper?
Present the extraction plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Interactive Workflow
Phase 1: Paper Ingestion
Read the paper provided by the user (PDF path, file content, or pasted sections).
PDF Reading Guidance — Claude Code's Read tool natively supports PDF files. Use the following strategy:
Short PDFs (up to ~10 pages): Read the entire file in a single call with no pages parameter.
Long PDFs (more than 10 pages): Read in chunks using the pages parameter (maximum 20 pages per request). Example sequence: pages: "1-10", then pages: "11-20", and so on.
Recommended reading order:
Read pages 1-2 first (abstract + introduction) to identify the paper type and decide whether full extraction is warranted.
Then read the Methods section in detail (locate the relevant page range from the table of contents or section headers).
Read Results and Discussion selectively for reported parameter values not stated in Methods.
Identify the paper type — this determines the extraction strategy:
Experimental paper — contains original experiments with participants
Methods paper — introduces or validates an analysis technique/pipeline
Computational modeling paper — builds or tests formal models of cognition
Review/theoretical paper — synthesizes literature or proposes theoretical frameworks
Confirm the paper type with the user before proceeding.
See references/extraction-guide.md for detailed extraction strategies per paper type.
Phase 2: Content Scanning and Candidate Identification
Scan the paper and identify all extractable methodological content organized into these categories:
Norming procedures, controlled variables, material selection criteria
Describes a computational model with equations/parameters
Model fitting procedure, parameter priors, model comparison strategy
Provides actionable methodological recommendations with specific values
"Use minimum 30 trials per condition", "Set high-pass filter no lower than 0.1 Hz"
NOT SUITABLE — filter out if the candidate:
Criterion
Examples
Is narrative or historical overview
"The study of attention began with William James..."
Is a definition without actionable parameters
"Working memory is defined as..."
Is theoretical debate without methods
"The modularity hypothesis predicts..."
Is motivation or background only
"Previous studies have shown that..." leading to no method
Contains only results without methodological detail
"The ANOVA revealed a significant main effect..."
Decision rule: "Does this candidate contain enough specific, actionable detail that a researcher could REPRODUCE a method, pipeline, or paradigm from it?" If YES → [SUITABLE]. If NO or UNCERTAIN → [FILTERED — reason].
Mark each candidate when presenting to the user. Filtered candidates are shown but de-prioritized — the user can override any filter decision.
Phase 3: User Selection and Confirmation
Receive the user's selection of which items to extract.
For each selected item, perform deep extraction (see extraction depth requirements below).
Present the extracted detail for user review before generating the final skill file.
Phase 4: Skill Generation
Generate the skill file(s) using the standard template (see references/skill-template.md).
Each generated skill must:
Have valid YAML frontmatter with name, description, and papers fields
Include all numerical parameters with their citations from the source paper
Stay under 500 lines; use references/ subdirectory for overflow
Pass the domain-knowledge litmus test: "Would a competent programmer who has never taken a cognitive science course get this wrong?"
Present the generated skill to the user and ask for confirmation before saving.
Phase 5: Self-Verification (Hallucination Check)
After generating the skill but before saving, perform a systematic verification of every numerical parameter and specific factual claim against the source paper.
Verification procedure — for each numerical value or specific claim in the generated skill:
Locate in source — Find the corresponding statement in the original paper. Use the source location recorded during extraction.
Verify value — Confirm exact numerical match, correct units, and complete context (e.g., "0.1-30 Hz bandpass" must not be truncated to "0.1 Hz").
Classify any issues found:
Issue Type
Description
Severity
not_found
Claim appears in the skill but cannot be found in the source — likely hallucinated
High
value_mismatch
Value exists in source but differs (e.g., skill says "250 ms", source says "200 ms")
High
unit_error
Numerical value matches but units are wrong or missing
High
context_distortion
Value is technically present but used in misleading context
Medium
location_wrong
Value is correct but the claimed source location is wrong
Low
incomplete
Skill presents a partial version of a parameter that has important qualifiers
Low
Reporting — Present the verification results to the user:
Self-Verification Results:
- Claims checked: N
- Verified: M
- Issues found: K
- [HIGH] <claim> — <issue type>: <details>
- [LOW] <claim> — <issue type>: <details>
Rules:
High-severity issues (not_found, value_mismatch, unit_error) must be corrected before saving.
Medium/low-severity issues are flagged but the skill can be saved with them annotated.
Do NOT flag reasonable paraphrasing, organizational differences, or standard terminology substitutions.
Extraction Depth Requirements
For every extracted item, the following cross-cutting rules apply to ALL categories:
Cross-Cutting Extraction Rules
Preserve exact numbers — Never round. If the paper says "513 ms", write "513 ms", not "~500 ms".
Track source location — For every extracted numerical value, record where it appears in the paper: "Section X.Y, paragraph N", "Table N", "Figure N caption", or "Supplementary Materials, page N". This enables downstream verification.
Flag missing information — If a standard parameter for this method type is not reported in the paper, explicitly note its absence (e.g., "Filter order: not reported").
Capture rationale — When the authors explain WHY they chose a parameter value, include that justification alongside the value.
Note deviations from convention — When authors explicitly deviate from field conventions, capture both what they did and their stated reason.
These rules apply to every category below. The parameter tables in generated skills must include a Source Location column (see references/skill-template.md).
Experimental Design Parameters
Paradigm name and classification (e.g., "oddball paradigm", "visual world paradigm")
Number of conditions and their operational definitions
Skill name: The name field in YAML frontmatter may only contain lowercase letters, numbers, and hyphens, and must match the folder name — e.g., folder mmn-oddball-paradigm/ → name: "mmn-oddball-paradigm"
YAML frontmatter: Contains at minimum name (human-readable) and description (one-sentence summary) fields
Papers field: Frontmatter includes a papers field listing the source paper(s) in "Author, Year" format
Research Planning Protocol: A customized version of the standard preamble is included after the "When to Use" section and before the first domain-specific logic section (see the research-literacy skill for the template)
Line count: SKILL.md is under 500 lines; overflow content is placed in references/ subdirectory
References directory: If supplementary files exist, they live in references/ and are explicitly referenced from SKILL.md
Encoding: UTF-8, LF line endings, 2-space indentation for YAML
Content Quality Checklist
Completeness — Every numerical parameter mentioned in the paper's methods section is captured
Citation accuracy — All values cite the source paper (Author, Year) and page/table number where possible
Reproducibility — Another researcher could implement this method from the skill alone, without reading the original paper
Domain specificity — Every item passes the litmus test: "Would a competent programmer who has never taken a cognitive science course get this wrong?"
Parameter precision — No rounding or approximation of reported values; use exact figures from the paper
Source traceability — Every numerical parameter includes a source location (Section/Table/Figure reference)
Required Structured Sections in Generated Skills
Every generated skill must include these sections (may be empty if no items apply, but must be explicitly checked):
## Missing Information — List standard parameters for this method type that the paper does not report. Format: "- [Parameter name]: Not reported. Standard value from [field/reference] is [value]." This section helps users know what they must determine independently.
## Deviations from Convention — List any methodological choices that deviate from field conventions, with the authors' stated rationale. Format: "- [Choice]: Authors used [X] instead of conventional [Y] because [reason]." This section alerts users to non-standard decisions.
Handling Ambiguity
When the paper is unclear or omits details:
Missing parameters: Flag explicitly — "The paper does not report [X]. This must be determined empirically or sourced from [suggested reference]."
Ambiguous descriptions: Present both plausible interpretations and ask the user to select one.
Non-standard methods: Note deviations from field conventions and flag whether the deviation is intentional (per authors' justification) or potentially an error.
Supplementary materials: Ask the user if supplementary materials are available, as critical method details are often reported there.
Multi-Skill Extraction
When a paper contains multiple independent methods worth extracting:
Generate separate skills for each method that can stand alone (e.g., a paradigm skill and an analysis skill from the same paper).
Cross-reference between skills using relative paths when methods are interdependent.
Each skill must be independently usable — no skill should require reading another skill to function.
Batch Extraction Mode
When the user provides multiple PDFs or a directory of papers, apply the following workflow:
Triggering Batch Mode
Batch mode activates when the user:
Provides two or more PDF paths in a single message
Points to a directory containing multiple papers
Uses phrases like "extract skills from all these papers" or "process this folder"
Batch Processing Steps
Inventory the inputs — List all papers found (file names + page counts if determinable) and present the list to the user for confirmation before reading anything.
Process each paper sequentially — Run each paper through the full 4-phase workflow (Ingestion → Scanning → Selection → Generation). Apply the PDF reading strategy from Phase 1 to every paper.
Present candidates grouped by paper — After scanning all papers, show all extractable candidates together, clearly grouped under each paper's title:
## Paper 1: <Title / filename>
- [1] Paradigm: ...
- [2] Analysis: ...
## Paper 2: <Title / filename>
- [3] Paradigm: ...
- [4] Data Acquisition: ...
Which items would you like to extract? (Enter numbers, ranges, "all", or "all from paper 1")
Allow cross-paper skill merging — If two or more papers describe the same or highly overlapping methods (e.g., both use the same EEG preprocessing pipeline with the same parameters), flag the overlap and offer to merge them into a single skill that cites all source papers. Only merge when the core parameters and decision logic are genuinely shared; keep skills separate when parameter choices differ.
Generate skills independently — Each generated skill must be fully self-contained. No skill may depend on another skill generated from a different paper in the same batch. Cross-reference between skills using relative paths only for closely related methods from the same paper (as in Multi-Skill Extraction above).
Batch Quality Checks
Before finalizing batch output, verify:
Every skill cites its specific source paper(s), not just the batch as a whole.
Merged skills list all contributing papers in the papers frontmatter field.
Skill directory names remain unique across the batch; if two papers generate a similar skill, append a disambiguating suffix (e.g., mmn-oddball-paradigm-smith2019 vs. mmn-oddball-paradigm-jones2021).
The <generated-skill-name> uses kebab-case and is descriptive of the specific method, not the paper. For example:
A paper on ERP mismatch negativity → skills/mmn-oddball-paradigm/
A paper on fMRI preprocessing → skills/fmri-preprocessing-pipeline/
A paper on drift-diffusion modeling → skills/drift-diffusion-fitting/
Example Interaction Summary
User: Extract skills from this paper. [provides PDF path]
Agent: [Reads paper]
This is an experimental paper (EEG study on language processing).
Does that match your understanding?
User: Yes.
Agent: I identified the following extractable methods:
[Lists candidates by category with numbers]
Which items would you like me to extract?
User: 1, 3, 4-6
Agent: Here is the detailed extraction for your selected items:
[Shows extracted parameters for review]
Does this look correct? Any corrections needed?
User: Looks good, but the sampling rate was 512 Hz not 500 Hz.
Agent: Corrected. Here is the generated skill:
[Shows SKILL.md content]
Shall I save this to skills/<name>/SKILL.md?
User: Yes.
Agent: [Saves file]
Skill saved to skills/<name>/SKILL.md.