Guides the design of cross-national comparative survey experiments. Use when (1) selecting countries for a multi-country study, (2) localizing experimental instruments across languages and institutional contexts, (3) calibrating origin-country stimuli for immigration experiments, (4) conducting per-country power analyses, or (5) planning a cross-national analytical strategy with pooled and per-country models. Covers case selection, instrument localization, ecological validity, power management, and sensitivity bias auditing.
scdenney15 Sterne31.03.2026
Beruf
Kategorien
Laborwerkzeuge
Skill-Inhalt
Instructions
1. Case Selection and Theoretical Justification
Variation by Design: Cases should be selected to vary on theoretically relevant dimensions (e.g., regime type, institutional structure, immigration history, partisan polarization), not for convenience or data availability. Each case should serve a specific theoretical function.
Case Selection Table: Produce a summary table listing each country, its relevant contextual features (e.g., immigration salience, institutional architecture, regime type), and the specific motivation for its inclusion. This table belongs in the PAP and should be referenced in the narrative.
Avoid the "Most Similar" Trap: Cross-national designs are often most informative when cases are deliberately diverse. Including cases that vary on regime type (e.g., democracy vs. hybrid regime) or institutional structure (e.g., parliamentary vs. presidential) enables stronger tests of whether a theoretical mechanism generalizes beyond a single political context.
The N=4 Sweet Spot: For survey experiments with ~2,500 respondents per country, 3--5 cases typically balance theoretical leverage with practical feasibility (budget, translation, vendor logistics). Fewer than 3 limits comparison; more than 5 strains resources without proportional analytical gain.
Verwandte Skills
Information Environment Variation: The information environment differs across countries -- media systems, political knowledge levels, and the accessibility of policy information vary. A treatment that is "informative" in one context may be redundant or incomprehensible in another. Document the expected information environment per country and consider how it affects treatment interpretation (Druckman 2022).
2. Instrument Localization
Conceptual Equivalence over Literal Translation: The goal is not identical wording across countries but equivalent conceptual meaning. A "skilled worker visa" means H-1B in the US, E-7 in South Korea, S Pass in Singapore. Use country-specific referents that carry the same conceptual weight. Word choice constitutes a framing treatment and must be calibrated per country, not just translated -- terms like "unauthorized," "undocumented," and "illegal" carry different connotations across linguistic and political contexts (Stantcheva 2023, citing Merolla et al. 2013).
Qualitative Treatment Framing: For information experiments fielded across countries, consider using qualitative descriptions rather than exact statistics. Qualitative information "can help make the treatment more homogeneous when doing an experiment in several countries or settings" because exact numbers carry different meanings in different contexts (Stantcheva 2023).
Anti-Acquiescence Measurement: Acquiescence bias varies by culture. Use item-specific scales (not agree-disagree or true-false formats), balanced scales with equal numbers of positive and negative options, and randomized question direction to ensure measurement equivalence across countries (Stantcheva 2023).
Coarse Measurement as Principled Commitment: When implementing the same conceptual measure across diverse contexts, accept that some precision is lost in translation. Treat this as a principled commitment to comparability rather than a limitation -- coarse measurement that is consistent across contexts is preferable to precise measurement that means different things in different places (Sniderman 2018).
Institutional Referents: Localize all political actors (president, cabinet, ministry), legislative bodies (Congress, Bundestag, National Assembly, Parliament), international organizations (UN, EU, OECD, ASEAN), and policy instruments (visa types, permit categories). Document all localizations in a country-specific table.
Cultural Calibration of Stimuli: For attributes like "country of origin" that signal cultural proximity or distance, the specific countries must be calibrated per respondent country. "Culturally distant" means different things in the US (e.g., Somalia) than in South Korea (e.g., Yemen) or Singapore (e.g., Afghanistan). Document the calibration rationale.
Ecological Validity Check: Every experimental stimulus should be plausible in the respondent's national context. A vignette about EU directives makes sense for German respondents but not Singaporean ones. Test each localized version against the question: "Would a newspaper in this country plausibly report this?" Note that being "outside the lab" does not automatically confer ecological validity -- it requires deliberate calibration of stimuli to real-world conditions in each country (Mutz 2011).
3. Composition and Origin Country Selection
Origin Countries as Group Cues: When immigrant origin countries are used as experimental stimuli (common in immigration conjoint studies), select origins that: (a) are recognizable to respondents, (b) vary on the intended dimension (e.g., cultural proximity), and (c) do not carry overwhelming confounding associations (e.g., ongoing war, recent political crisis) unless those associations are part of the theoretical design.
Nesting Origins within Domains: If the experiment crosses policy domain (e.g., labor vs. asylum) with immigrant composition, it may be necessary to use different origin countries for different domains within the same respondent country. This preserves ecological validity (asylum seekers plausibly come from different countries than labor migrants) but means the composition attribute cannot be cleanly separated from domain. Document this nesting and acknowledge it in the analysis.
Avoiding Origin Overlap: Within a single respondent country, avoid reusing the same origin country across different experimental roles (e.g., using Nigeria as both the "distant" labor origin and the "distant" asylum origin). This prevents respondents from noticing repeated countries and developing response patterns.
4. Cross-National Power and Error Management
Per-Country Power Analysis: Conduct power analyses separately for each country, not just for the pooled sample. Country-specific samples (typically 1,500--2,500) may be underpowered for detecting small AMCEs or interaction effects even when the pooled analysis is well-powered.
Compromise Power for Small-N Countries: When country-specific sample sizes are fixed by budget constraints, use a compromise power analysis that minimizes the combined Type I + Type II error rate. An alpha > 0.05 may be justified for country-level analyses where the per-country N is small but fixed (Lakens 2025).
Context-Specific SESOI: The smallest meaningful effect size may differ by country. A 3-percentage-point AMCE for a policy attribute may be substantively important in a country where the policy is highly salient but trivial where it is not. Document the SESOI rationale per country when it varies.
Minimum Effect Tests for Pooled Data: When pooling data across countries, large pooled N makes it easy to find "significant" trivial correlations. Use minimum effect tests (testing whether the effect exceeds a meaningful threshold, not just whether it differs from zero) to ensure that pooled results are practically significant (Lakens 2025).
Sequential Factorials Across Countries: In multi-experiment, multi-country designs, consider a "sequential factorial" approach where the factorial structure builds across experiments while the country set remains constant. This enables cumulative theoretical leverage: Experiment 1 establishes baseline effects across countries; Experiment 2 "splices in" additional factors to probe mechanisms (Sniderman 2018).
5. Cross-National Analytical Strategy
Per-Country Estimation First: Always estimate the primary model separately for each country before pooling. This respects the possibility that the data-generating process differs across contexts and provides country-specific effect sizes that are interpretable on their own.
Pooled Models with Country Interactions: For formal tests of cross-national heterogeneity, estimate a pooled model that includes attribute $\times$ country interaction terms. Wald tests on the interaction coefficients provide a statistical test of whether effect sizes differ across countries. Note that respondent fixed effects absorb country main effects when country is a respondent-level constant.
Scope Conditions, Not Country Hypotheses: When the theory predicts that effect magnitudes will vary by institutional context but does not make precise predictions about which country will show the largest effect, frame country-specific expectations as "contextual expectations and scope conditions" rather than confirmatory hypotheses. This avoids the proliferation of underpowered country-level tests while preserving theoretical nuance.
Response Style Biases: Moderacy, extreme response, and response order biases vary systematically across cultures and socioeconomic groups. Accounting for these biases "may be particularly important in cross-country studies" because response tendencies differ across cultural contexts (Stantcheva 2023). Assess response style distributions per country and consider whether they distort cross-national comparisons.
Sensitivity Bias by Context: Social desirability bias may vary systematically across countries with different political regimes and norms of expression. Blair et al. (2020) provide domain-specific estimates of sensitivity bias -- use these to assess whether direct-question measures are trustworthy per country or whether indirect measurement (e.g., list experiments) is needed in some contexts but not others.
Visualization: Present cross-national results as side-by-side AMCE panels (one per country) or forest plots. These enable visual comparison of effect magnitudes and patterns, which is often more informative than formal statistical tests of heterogeneity.
6. Practical Considerations
Survey Vendors: Cross-national online surveys typically require country-specific vendors or an international panel provider. Verify that quota-matching (age, gender, education, region) is available in all target countries. Note whether samples are probability-based or nonprobability panels and the implications for PATE estimation in each country (Mutz 2011). For developing or middle-income country cases, acknowledge that online samples are "online representative" rather than nationally representative (Stantcheva 2023) and discuss implications for generalizability.
Survey Mode Consistency: Maintain consistent survey mode across countries (all online, all CATI, etc.) to avoid confounding treatment effects with mode effects. If mode must vary by country, document this and assess its impact.
Cross-National Piloting: Pilot the localized instrument in each country, not just one. Pilots should test whether localized stimuli function equivalently -- whether respondents access, attend to, and interpret the information as intended across contexts (Druckman 2022).
Translation Protocol: Use professional translation with back-translation verification. For experimental stimuli, the back-translation should be evaluated by a bilingual researcher who understands the experimental design, not just the language.
Ethical Approvals: Some countries or institutions require separate ethics review. Plan for varying timelines and requirements across sites.
Data Sharing Across Privacy Regimes: Cross-national studies involve data from multiple countries with potentially different privacy regimes (e.g., GDPR in Europe). Plan for data sharing given varying legal and ethical constraints across national contexts (Christensen et al. 2019).
Pre-Registration: Pre-register at the study level (not per-country). The pre-analysis plan should include both the pooled analytical strategy and the per-country estimation plan.
Firehouse Readiness: Internet-based panels make cross-national experiments feasible for rapid-response "firehouse studies" after international events. Having pre-established vendor relationships in multiple countries reduces deployment time (Mutz 2011).
Quality Checks
Case Justification: Is each country's inclusion theoretically motivated and documented in a case selection table?
Localization Documented: Are all country-specific referents (actors, institutions, visa types, origin countries) listed in a localization table?
Conceptual Equivalence: Do localized stimuli carry equivalent conceptual meaning across countries, even if wording differs?
Ecological Validity: Is every stimulus plausible in each national context?
Origin Calibration: Are origin countries calibrated for cultural proximity/distance per respondent country, with rationale documented?
No Origin Overlap: Within each respondent country, is each origin country used in only one experimental role?
Per-Country Power: Has a per-country power analysis been conducted, with MDE reported for each country?
Per-Country Models: Does the analysis plan include per-country estimation before pooling?
Scope Conditions: Are cross-national expectations framed as scope conditions rather than confirmatory hypotheses (unless the theory makes precise directional predictions)?
Sensitivity Audit: Has the risk of social desirability bias been assessed per country, with indirect measurement considered where needed?
Mode Consistency: Is the survey mode consistent across countries, or are mode differences documented?
Cross-National Piloting: Has the localized instrument been piloted in each country?
Translation: Is a professional translation and back-translation protocol specified?
Response Style Biases: Have moderacy, extreme response, and acquiescence biases been assessed across countries (Stantcheva 2023)?
Online Representativeness: For developing/middle-income country samples, is the "online representative" limitation acknowledged (Stantcheva 2023)?
Treatment Homogeneity: Are information treatments designed to maintain equivalence across countries (e.g., qualitative rather than quantitative framing where appropriate)?
Word Choice Calibration: Has terminology been tested for differential framing effects across languages (Stantcheva 2023)?