End-to-end cross-national comparison study using KNHANES + NHANES + CHNS (or other parallel surveys). Variable harmonization, parallel weighted analysis, and comparison tables. Supports 2-country (KR+US) and 3-country (KR+US+CN) designs.
You are assisting a medical researcher in conducting a cross-national comparison study using parallel nationally representative surveys (e.g., KNHANES for Korea, NHANES for the US, CHNS for China).
harmonization_knhanes_nhanes.csvmedsci-skills/skills/replicate-study/references/harmonization_knhanes_nhanes.csvmedsci-skills/skills/write-paper/references/paper_types/cross_national.md — writing templatemedsci-skills/skills/analyze-stats/references/analysis_guides/survey_weighted.mdKNHANES (single CSV):
NHANES (multiple CSVs):
For EACH country independently:
Generate a side-by-side comparison:
| Analysis | Korea wOR (95% CI) | US wOR (95% CI) | Direction Agreement |
|---|---|---|---|
| Overall (fully adjusted) | ... | ... | ✓/✗ |
| Male | ... | ... | |
| Female | ... | ... | |
| ... | ... | ... |
{working_dir}/
├── cross_national_report.md — Study summary + comparison tables
├── variable_mapping.csv — Variable mapping with match status
├── analysis_korea.R — KNHANES analysis (self-contained)
├── analysis_us.R — NHANES analysis (self-contained)
├── results/
│ ├── table1_korea.csv
│ ├── table1_us.csv
│ ├── main_results_comparison.csv
│ └── subgroup_comparison.csv
└── manuscript_draft/ — Optional: Methods + Results draft
├── methods_draft.md
└── results_draft.md
| Variable | Raw Var | Coding |
|---|---|---|
| Smoking | BS3_1 | 1,2=Current; 3=Former; 8=Never |
| Alcohol | BD1_11 | 2-6=Frequent (current drinker); 1=Occasional (past-year abstainer); 8=Never |
| Obesity | HE_obe | 1-3=Normal; 4-6=Obesity (BMI≥25) |
| Depression | BP_PHQ_1~9 | Sum ≥10 = depression |
| Diabetes | HE_glu, HE_HbA1c, DE1_dg | FPG≥126 or HbA1c≥6.5 or DE1_dg=1 |
| CVD | DI4_dg, DI5_dg, DI6_dg | Any = 1 → CVD yes |
| Education | edu | 1-3=Non-college; 4=College |
| Income | incm | 1-3=Bottom 80%; 4=Top 20% |
| Survey design | kstrata, psu, wt_itvex | strata, cluster, weight |
CRITICAL: NHANES data downloaded via R nhanesA package uses TEXT LABELS, not numeric codes.
| Variable | Raw Var | Text Labels → Numeric |
|---|---|---|
| PHQ-9 items | DPQ010~DPQ090 | "Not at all"→0, "Several days"→1, "More than half the days"→2, "Nearly every day"→3 |
| Sex | RIAGENDR | "Male" / "Female" (NOT 1/2) |
| Smoking (100 cigs) | SMQ020 | "Yes" / "No" |
| Smoking (now) | SMQ040 | "Every day" / "Some days" / "Not at all" |
| Alcohol freq | ALQ121 | Text labels (see below) |
| Alcohol ever | ALQ111 | "Yes" / "No" |
| Education | DMDEDUC2 | 5 text levels (see SKILL.md Phase 2) |
| Diabetes dx | DIQ010 | "Yes" / "No" / "Borderline" |
| CVD (CHF) | MCQ160B | "Yes" / "No" / "Don't know" |
| CVD (CHD) | MCQ160C | "Yes" / "No" / "Don't know" |
| CVD (angina) | MCQ160D | "Yes" / "No" / "Don't know" |
| Fasting glucose | LBXSGL (BIOPRO_J) | Numeric (mg/dL) — note: NOT LBXSGLU |
| HbA1c | LBXGH (GHB_J) | Numeric (%) |
| BMI | BMXBMI (BMX_J) | Numeric (kg/m²) |
| Weight | WTMEC2YR (single-cycle) or WTMECPRP (pre-pandemic pooled) | Numeric |
| Strata | SDMVSTRA | Numeric |
| PSU | SDMVPSU | Numeric |
| Variable | Raw Var | Coding |
|---|---|---|
| Asthma | DJ2_dg | 0=No, 1=Yes (physician dx), 9=Don't know → exclude |
| Asthma treatment | DJ2_pt | 0=No, 1=Yes, 8=N/A, 9=Don't know |
| Sleep (2017-18) | BP16_11/12/13/14 | Clock times, NOT hours! 11=bed hour, 12=bed min, 13=wake hour, 14=wake min. Calculate: duration = wake_time - bed_time (handle midnight crossing). 99=Don't know→NA |
| Sleep (2017-18 weekend) | BP16_21/22/23/24 | Same format as weekday |
| Sleep (2019-20) | BP16_1/2 | Direct sleep hours (weekday/weekend). 99=Don't know→NA |
| PA aerobic | pa_aerobic | 0=Doesn't meet, 1=Meets guidelines. Note: values are 0/1, NOT 1/2 |
| HTN treatment | DI1_pr | 1=Yes, 0=No (currently treating hypertension) |
| Dyslipidemia tx | DI3_pr | 1=Yes, 0=No (if available) |
| Non-HDL chol | HE_chol - HE_HDL_st2 | Derived: total cholesterol minus HDL |
| Variable | Raw Var | Coding |
|---|---|---|
| Asthma | MCQ010 | "Yes" / "No" (ever told by doctor) |
| Sleep hours | SLD012 | Numeric (hours/night on weekdays) |
| BP treatment | BPQ020 | "Yes" / "No" (told by doctor, high BP) |
| Cholesterol treatment | BPQ100D | "Yes" / "No" (taking cholesterol Rx) |
| PA vigorous work | PAQ605/PAQ610/PAD615 | Yes/No, days/week, min/day |
| PA moderate work | PAQ620/PAQ625/PAD630 | Yes/No, days/week, min/day |
| PA walk/bike | PAQ635/PAQ640/PAD645 | Yes/No, days/week, min/day |
| PA vigorous rec | PAQ665/PAQ670/PAD675 | Yes/No, days/week, min/day |
| PA moderate rec | PAQ650/PAQ655/PAD660 | Yes/No, days/week, min/day |
| Dietary fiber | DR1TFIBE (DR1TOT_J) | Numeric (grams, day 1 recall) |
| Dietary sodium | DR1TSODI (DR1TOT_J) | Numeric (mg) |
| Dietary sat fat | DR1TSFAT (DR1TOT_J) | Numeric (grams) |
| Total energy | DR1TKCAL (DR1TOT_J) | Numeric (kcal) |
| Total sugars | DR1TSUGR (DR1TOT_J) | Numeric (grams) |
| Non-HDL chol | LBXTC - LBDHDD | Derived: TCHOL_J minus HDL_J |
Data source: cpc.unc.edu/projects/china (free registration)
Biomarker wave: 2009 only (N=9,549). Other variables available 1989-2015.
Survey design: No formal weights. Use svydesign(id=~COMMID, weights=~1) or cluster-robust SE.
| File | Key Variables | Join Key |
|---|---|---|
| mast_pub_12 | IDind, GENDER (1=M/2=F), WEST_DOB_Y (birth year) | IDind |
| pexam_00 | HEIGHT, WEIGHT, U10 (waist), SYSTOL1-3, DIASTOL1-3, U22 (HBP dx), U24 (HBP meds), U24A (DM dx), U25 (ever smoked), U27 (still smokes), U40 (alcohol), U41 (freq), U48A (self-health), COMMID | IDind + filter WAVE==2009 |
| biomarker_09 | GLUCOSE_MG, HbA1c, TC_MG, TG_MG, HDL_C_MG, LDL_C_MG, HS_CRP, HGB, WBC, ALT, CRE_MG | IDind |
| educ_12 | A12 (education 0-6) | IDind + filter WAVE==2009 |
| indinc_10 | indwage (yuan, continuous → quartiles) | IDind + filter wave==2009 |
| Variable | Raw Var | Coding | Notes |
|---|---|---|---|
| Sex | GENDER | 1=Male, 2=Female | Same as KNHANES/NHANES |
| Age | WEST_DOB_Y | age = wave_year - WEST_DOB_Y | Integer truncation |
| BMI | HEIGHT, WEIGHT | WEIGHT / (HEIGHT/100)^2 | Obesity: BMI ≥ 28 (WGOC, NOT 25 or 30) |
| Waist | U10 | cm, direct measurement | Central obesity: ≥90M / ≥80F (IDF-Asian) |
| SBP | SYSTOL1-3 | mean(SYSTOL1, SYSTOL2, SYSTOL3) | 3 readings averaged |
| DBP | DIASTOL1-3 | mean(DIASTOL1, DIASTOL2, DIASTOL3) | 3 readings averaged |
| HBP diagnosed | U22 | 0=No, 1=Yes, 9=Don't know (→NA) | |
| HBP medication | U24 | 0=No, 1=Yes | |
| DM diagnosed | U24A | 0=No, 1=Yes, 9=Don't know (→NA) | |
| Smoking | U25 + U27 | never(U25==0) / former(U25==1 & U27==0) / current(U25==1 & U27==1) | |
| Alcohol | U40 + U41 | never(U40==0) / occasional(U41≥4) / frequent(U41≤3, ≥1x/week) | U41: 1=daily, 2=3-4x/wk, 3=1-2x/wk, 4=1-2x/mo, 5=<1x/mo |
| Education | A12 | 0=none, 1=primary, 2=lower-mid, 3=upper-mid, 4=technical, 5=university, 6=master+. Recode: 0-2→low, 3-4→mid, 5-6→high | |
| Income | indwage | Continuous yuan → quartiles within wave | |
| Glucose | GLUCOSE_MG | mg/dL (also GLUCOSE in mmol/L) | 2009 only |
| HbA1c | HbA1c | % (direct) | 2009 only |
| TC | TC_MG | mg/dL | 2009 only |
| TG | TG_MG | mg/dL | 2009 only |
| HDL | HDL_C_MG | mg/dL | 2009 only |
| hsCRP | HS_CRP | mg/L | 2009 only |
| Hemoglobin | HGB | g/L (divide by 10 for g/dL) |
wake_time - bed_time with midnight crossing.[VERIFY: variable_name] and ask the user to confirm against the data dictionary./search-lit for all citations.| Unit differs from KR/US |
| Self-health | U48A | Self-reported health status | 2004-2011 |
| Depression | — | NOT AVAILABLE in standard download. CES-D exists but needs separate dataset. | Cannot directly compare with PHQ-9 |