Instructions and conventions for working with SurveyCTO forms and survey design in this project.
SurveyCTO is a platform for designing and collecting survey data, primarily on mobile devices. Forms are defined in an XLSForm spreadsheet with (at minimum) two tabs: survey and choices.
Each row defines one field. Key columns:
| Column | Purpose |
|---|---|
type | Field type (e.g., text, integer, select_one listname, begin repeat) |
name | Variable name — becomes the column header in exported data |
label | Human-readable question text shown to enumerators |
relevant | Expression controlling whether the field is shown (skip logic) |
constraint | Validation expression; blocks form submission if false |
calculate | Expression for calculate-type fields (see below) |
repeat_count | For repeat groups: how many times to repeat (can reference a prior field) |
Each row defines one choice option for a select_one or select_multiple field. Key columns:
| Column | Purpose |
|---|---|
list_name | Links to the select_one listname / select_multiple listname type value |
name | The coded value stored in exported data (e.g., 1, 2, yes) |
label | Human-readable text shown to the enumerator |
Important: exported data contains the choice name (coded value), not the label.
select_one listnameSingle-choice field (radio buttons by default). Exported as a single column containing the
chosen option's name value.
select_multiple listnameMulti-choice field (checkboxes by default). Exports produce two things:
name values in one column
(e.g., "1 3 5")fieldname_choicename
(e.g., crops_maize, crops_beans, crops_wheat)When working with a select_multiple variable, use the dummy columns for analysis; the
space-separated column is mainly useful for filtering/checking.
calculateAutomatically computed field — never shown in the UI. The value is derived from an expression
referencing other fields (e.g., ${field_a} + ${field_b}). Appears as a regular column in
exported data. Used for derived variables, running totals, formatted strings, etc.
noteDisplay-only text shown to the enumerator; no data collected. Does not appear in exported data.
geopointGPS coordinate capture. Exports as four columns: fieldname-Latitude,
fieldname-Longitude, fieldname-Altitude, fieldname-Accuracy.
Repeat groups allow a block of questions to be asked multiple times (e.g., once per household member, once per business, once per crop). Defined in the survey tab with:
begin repeat group_name "Group label"
... fields ...
end repeat
The number of repetitions is either open-ended (enumerator decides) or fixed via repeat_count
referencing a prior integer field (e.g., ${num_businesses}).
Each instance gets a numeric suffix appended to every field name:
age_1, age_2, age_3 … for a field named age in a repeat groupEach repeat group exports as a separate file/sheet, with one row per instance. A KEY
column uniquely identifies each instance; a PARENT_KEY column links back to the parent
submission. Useful for reshaping or when instances are numerous.
The relevant column holds a boolean expression. A field is only shown when the expression
evaluates to true; otherwise it is hidden and skipped.
# Examples
${consent} = 1
${num_businesses} > 0 and ${country} = 'kenya'
selected(${activities}, 'farming') # for select_multiple: checks if 'farming' was chosen
Critical for data work: when a field is not relevant (i.e., was skipped), it still appears in the exported dataset — as a blank/empty value. Structurally missing data (due to skip logic) is therefore indistinguishable from item non-response in the raw export. To correctly interpret missingness, you must trace the relevance condition in the survey tab.
When group names are enabled (the default in SurveyCTO Desktop), column names in the export
are prefixed with the names of all enclosing groups, separated by hyphens or underscores. For
example, a field named profit inside a group named business inside a group named module2
might export as module2-business-profit or module2_business_profit.
Repeat group suffixes (_1, _2, …) are appended after the full prefixed field name.
Practical implication: a variable named group_subgroup_fieldname_3 in the data means:
group (and subgroup)fieldname| Cause | Appearance in export |
|---|---|
| Field not relevant (skip logic) | Blank / empty string |
| Item non-response | Blank / empty string |
| Field added in a later form version | Blank for earlier submissions |
| Repeat group instance doesn't exist | Blank in wide-format suffix columns |
All four look identical in the raw CSV. Use the survey tab relevance conditions and form version history to distinguish them.
Some surveys capture multiple instances of a concept (e.g. up to 3 businesses, up to 3 jobs)
using fixed groups rather than a repeat group. Typically (although this is not an inherent feature of SurveyCTO), each instance gets its own begin group /
end group block in the survey tab, and every field inside is given a hard-coded numeric
suffix at design time (profit_1, profit_2, profit_3).
Key differences from repeat groups:
PARENT_KEY linking and no long-format export option for these fieldsstring-length(${description_2}) != 0)When you see variables with numeric suffixes in exported data, check the survey tab to determine whether they come from a repeat group or from fixed groups — the analytical implications differ.
A common pattern is a yes/no "entry screen" question that gates an entire section of follow-up questions. For example: "Do you own a business?" → if yes, a full block of business-detail questions is shown; if no, the entire block is skipped and all variables in it are blank in the export.
Critical analytical implication: when the entry screen is negative, all variables in the gated section are structurally missing — their true value is determined by the survey design, not by item non-response. Conversely, if the entry screen is positive but a required variable within the section is still blank, that is likely genuine missing data and should be treated differently from the structural case.
Do not assume what value to assign to structurally missing observations. The right value depends on the research context (e.g. zero, NA, or something else). Always ask the user what imputed value to use before writing any code that fills in blanks caused by a negative entry screen.
When reviewing a survey instrument, the two key files are:
Together these allow you to:
1 → "Yes", 2 → "No")relevant condition)_N suffix representsAfter completing a task, if you encountered any SurveyCTO behavior, quirk, or convention not covered above, ask the user:
"I noticed [X] — should I add this to the SurveyCTO skills file?"
If yes, propose a specific addition and offer to edit .claude/skills/surveycto/SKILL.md directly.
Before adding, check if the learning point is already covered or implied by an existing entry. Prefer refining existing entries over adding new ones.