Scaffold a new quantitative social science research project with documentation, materials management, and AI agent configuration. Invoke when the user wants to start a new research project, create a project template, or set up a project structure.
You are scaffolding a new research project using the Research Project Toolkit. The argument is: $ARGUMENTS
Parse the arguments:
--interactive flag is present → interactive mode--dataset flag is present → scaffold a shared dataset repo instead (use the dataset template below)The project name is the first non-flag argument. If no name is given, ask for one.
Generate the proposed template. Using the project name, generate the full directory tree and key files as described in the Template Specification below.
Preview before creating. Show the user:
Wait for confirmation. If the user says yes or presses enter, create the structure. If they provide feedback, adjust and re-preview.
Create everything. Write all files, then report what was created.
Show next steps. Tell the user what to do first (edit CLAUDE.md, set up data vault, etc.)
Ask these questions, one at a time. Use sensible defaults and explain what you recommend:
Question 1: Machine-readable companions "When you add raw files (PDFs, Qualtrics .qsf, emails), I'll create Markdown companions that I can read. How should I mark them as derived?
→ Double extension (recommended): survey.qsf.md sits next to survey.qsf. Ugly but unambiguous.
→ Header only: Normal names, but every derived file starts with a <!-- DERIVED --> header.
→ Parallel directories: materials/raw/ and materials/readable/ mirror each other."
Question 2: Shared datasets "Does this project use a dataset shared with other projects (like a panel survey)?
→ Yes — I'll set up references to a shared dataset repo and configure data paths. → No — All data is project-specific."
If yes, ask for the dataset repo URL/path and version.
Question 3: Data sensitivity "Does your data include personally identifiable information (PII)?
→ Yes (most survey data) — I'll set up a data vault outside the repo and configure path templates. → No (public/secondary data only) — Data can live directly in the repo."
Question 4: Target journal "What journal are you targeting? (This helps me set up the manuscript formatting. Say 'TBD' if unsure.)"
Then preview the template incorporating their answers, and proceed as in auto mode.
Create the following structure. Every directory with content should have meaningful files, not just .gitkeep placeholders.
CLAUDE.md (this is what Claude Code reads on every session):
# CLAUDE.md
> **Project:** [PROJECT_NAME]
> **Status:** design
## Quick Orientation
This is a quantitative social science research project. Here's how to navigate it:
- **Understand the project** → Start at `docs/index.md`, then `.claude/north-star.md`
- **Find survey instruments** → `materials/instruments/` (raw files + `.md` companions)
- **Read/write analysis code** → `manuscript/code/` (numbered scripts, run in order)
- **Run the paper** → `cd manuscript && quarto render paper.qmd`
- **Find design rationale** → `docs/` wiki, organized by topic
## Conventions
### Machine-readable companions
Every non-text file in `materials/` gets a Markdown companion with a double extension:
`survey.qsf.md` is derived from `survey.qsf`. Always include the header:
`<!-- DERIVED: source=FILENAME, generated=DATE, generator=claude -->`
### Data paths
Sensitive data lives OUTSIDE this repo in a data vault. Paths are configured in
`config/data_paths.R` (gitignored). See `config/data_paths.R.example` for the template.
### Analysis scripts
Numbered: `01-clean.R`, `02-analyze.R`, etc. Run in order. Located in `manuscript/code/`.
### Submissions
Each submission is tagged: `git tag -a "submission/JOURNAL-rN" -m "description"`.
Review materials go in `materials/submissions/round-N/`.
## Shared Datasets
[None configured. Update when adding shared datasets.]
## Current Priority
[Update this as the project evolves.]
README.md:
# [PROJECT_NAME]
[Brief project description — fill this in.]
## Quick Start
This project uses the [Research Project Toolkit](https://github.com/simonfriis/trellis) structure.
- Project wiki: `docs/index.md`
- Analysis & manuscript: `manuscript/`
- Raw materials: `materials/`
## Reproducing the Analysis
```bash
cd manuscript
Rscript -e 'renv::restore()'
quarto render paper.qmd
**.gitignore**: Use the comprehensive research project gitignore (see templates/gitignore in the skill directory).
### docs/ Directory
Create `docs/CLAUDE.md`:
```markdown
# Project Wiki
This directory is the project's documentation wiki. It's organized by topic.
## Navigation
- `index.md` — Master table of contents, start here
- `overview.md` — Project summary
- `design/` — Research questions, hypotheses, pre-analysis plan
- `instruments/` — Documentation *about* instruments (rationale, decisions)
- `data/` — Data sources, pipeline, variable construction
- `decisions/` — Methodological decision records (dated)
- `submissions/` — Submission strategy, revision notes
- `meetings/` — Meeting notes (dated)
- `DOCUMENTATION_GUIDE.md` — How the three doc layers work together
## Writing Conventions
- Plain Markdown, no framework-specific syntax
- Relative links between documents
- Date-prefix chronological files (meetings, decisions)
- One topic per file, split when they grow long
Create all the documentation files from the template specification (index.md, overview.md, design/research-questions.md, design/hypotheses.md, design/pre-analysis-plan.md, data/README.md, data/shared-datasets.md, decisions/README.md, submissions/README.md, meetings/README.md, instruments/README.md, DOCUMENTATION_GUIDE.md).
Create materials/CLAUDE.md:
# Materials Directory
Raw artifacts and their machine-readable companions.
## What goes here
Place raw files (.qsf, .pdf, .docx, .eml, .png) in the appropriate subdirectory.
Then run the `generate-companions` skill to create Markdown companions.
## Companion convention
- Double extension: `filename.ext.md` is derived from `filename.ext`
- Every companion starts with: `<!-- DERIVED: source=..., generated=..., generator=... -->`
- Companions describe CONTENT, not interpretation. Rationale goes in `docs/`.
## Subdirectories
- `instruments/` — Survey files, scales, protocols
- `ethics/` — IRB applications, approvals, consent forms
- `recruitment/` — Recruitment emails, platform configs
- `submissions/` — Editor letters, reviews, responses (by round)
- `correspondence/` — Important emails, MOUs
- `dissemination/` — Slides, posters, blog posts
- `pilot/` — Pilot testing materials
Create the materials README.md with full companion convention documentation. Create subdirectories with .gitkeep files.
Create manuscript/CLAUDE.md:
# Manuscript — Reproducible Analysis Package
This directory is self-contained. It should render from a clean clone.
## Key files
- `paper.qmd` — Quarto manuscript (main document)
- `_quarto.yml` — Quarto configuration
- `references.bib` — Bibliography
- `renv.lock` — R environment lockfile
- `R/` — Helper functions (sourced by analysis scripts)
- `code/` — Analysis scripts (numbered, run in order)
- `data/raw/` — De-identified raw data
- `data/derived/` — Analysis-ready datasets (generated by code/)
- `data/codebooks/` — Variable-level codebooks
- `data/DATA_SOURCES.md` — Data provenance documentation
- `output/figures/` — Generated figures
- `output/tables/` — Generated tables
## Rules
1. This directory must not depend on files outside itself at runtime.
2. All data in `data/raw/` must be de-identified.
3. Scripts in `code/` are numbered and run in order.
4. Generated outputs go in `output/` — don't hand-edit them.
5. When adding a package, run `renv::snapshot()` to update renv.lock.
Create: paper.qmd, _quarto.yml, references.bib, README.md, data/DATA_SOURCES.md, and subdirectories.
Create .claude/north-star.md, .claude/prompts/README.md, .claude/session-logs/README.md.
Create config/data_paths.R.example.
Create: templates/decision-record.md, templates/meeting-notes.md, templates/revision-strategy.md.
Create scripts/release.sh (with PII safety checks).
If --dataset is specified, scaffold a simpler structure for a shared dataset:
dataset-name/
├── CLAUDE.md
├── README.md
├── .gitignore
├── docs/
│ ├── codebook.md
│ ├── sampling.md
│ └── changelog.md
├── code/
│ ├── 01-deidentify.R
│ ├── 02-clean.R
│ └── 03-validate.R
├── data/
│ ├── raw/ (DVC-tracked)
│ ├── cleaned/ (DVC-tracked)
│ └── codebooks/
├── config/
│ └── vault_paths.R.example
└── renv.lock
After creating all files, report:
git init if this is a new directory