Comprehensive Stata reference for writing correct .do files, data management, econometrics, causal inference, graphics, Mata programming, and 17+ community packages (reghdfe, estout, did, rdrobust, etc.). Covers syntax, options, gotchas, and idiomatic patterns. Use this skill whenever the user asks you to write, debug, or explain Stata code.
You have access to comprehensive Stata reference files. Do not load all files.
Read only the 1-3 files relevant to the user's current task using the routing table below.
Critical Gotchas
These are Stata-specific pitfalls that lead to silent bugs. Internalize these before writing any code.
Missing Values Sort to +Infinity
Stata's . (and .a-.z) are greater than all numbers.
* WRONG — includes observations where income is missing!
gen high_income = (income > 50000)
* RIGHT
gen high_income = (income > 50000) if !missing(income)
* WRONG — missing ages appear in this list
list if age > 60
* RIGHT
list if age > 60 & !missing(age)
= vs ==
関連 Skill
= is assignment; == is comparison. Mixing them up is a syntax error or silent bug.
* WRONG — syntax error
gen employed = 1 if status = 1
* RIGHT
gen employed = 1 if status == 1
Local Macro Syntax
Locals use `name' (backtick + single-quote). Globals use $name or ${name}.
Forgetting the closing quote is the #1 macro bug.
* WRONG — error if data not sorted by id
by id: gen first = (_n == 1)
* RIGHT — bysort sorts automatically
bysort id: gen first = (_n == 1)
* Also RIGHT — explicit sort
sort id
by id: gen first = (_n == 1)
Factor Variable Notation (i. and c.)
Use i. for categorical, c. for continuous. Omitting i. treats categories as continuous.
* WRONG — treats race as continuous (e.g., race=3 has 3x effect of race=1)
regress wage race education
* RIGHT — creates dummies automatically
regress wage i.race education
* Interactions
regress wage i.race##c.education // full interaction
regress wage i.race#c.education // interaction only (no main effects)
generate vs replace
generate creates new variables; replace modifies existing ones. Using generate on an existing variable name is an error.
gen x = 1
gen x = 2 // ERROR: x already defined
replace x = 2 // correct
String Comparison Is Case-Sensitive
* May miss "Male", "MALE", etc.
keep if gender == "male"
* Safer
keep if lower(gender) == "male"
merge Always Check _merge
merge 1:1 id using other.dta
tab _merge // always inspect
assert _merge == 3 // or handle mismatches
drop _merge
preserve / restore for Temporary Changes
preserve
collapse (mean) income, by(state)
* ... do something with collapsed data ...
restore // original data is back