Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.
"nullable": trueGoal: Locate the Actor and understand its output
Initial request: $ARGUMENTS
Actions:
.actor/ directory containing actor.jsonactor.json to understand the Actor's configurationdataset_schema.json, output_schema.json, and key_value_store_schema.json already exist.actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structureActor.pushData(, dataset.pushData(, Dataset.pushData(Actor.push_data(, dataset.push_data(, Dataset.push_data(Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definitionstorages.dataset or storages.keyValueStore config exists in actor.json, note it for migrationPresent findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.
dataset_schema.jsonGoal: Create a complete dataset schema with field definitions and display views
{
"actorSpecification": 1,
"fields": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
// ALL output fields here — every field the Actor can produce,
// not just the ones shown in the overview view
},
"required": [],
"additionalProperties": true
},
"views": {
"overview": {
"title": "Overview",
"description": "Most important fields at a glance",
"transformation": {
"fields": [
// 8-12 most important field names
]
},
"display": {
"component": "table",
"properties": {
// Display config for each overview field
}
}
}
}
}
If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:
When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.
| Rule | Detail |
|---|---|
All fields in properties | The fields.properties object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the properties section must be the complete superset |
"nullable": true | On every field — APIs are unpredictable |
"additionalProperties": true | On the top-level fields object AND on every nested object within properties. This is the most commonly missed rule — it must appear at both levels |
"required": [] | Always empty array — on the top-level fields object AND on every nested object within properties |
| Anonymized examples | No real user IDs, usernames, or content |
"type" required with "nullable" | AJV rejects nullable without a type on the same field |
Warning — most common mistakes:
- Only including fields that appear in the overview view. The
fields.propertiesmust list ALL output fields, even if they are not in theviewssection.- Only adding
"required": []and"additionalProperties": trueon nested object-type properties but forgetting them on the top-levelfieldsobject. Both levels need them.
Note:
nullableis an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.
String field:
"title": {
"type": "string",
"description": "Title of the scraped item",
"nullable": true,
"example": "Example Item Title"
}
Number field:
"viewCount": {
"type": "number",
"description": "Number of views",
"nullable": true,
"example": 15000
}
Boolean field:
"isVerified": {
"type": "boolean",
"description": "Whether the account is verified",
"nullable": true,
"example": true
}
Array field:
"hashtags": {
"type": "array",
"description": "Hashtags associated with the item",
"items": { "type": "string" },
"nullable": true,
"example": ["#example", "#demo"]
}
Nested object field:
"authorInfo": {
"type": "object",
"description": "Information about the author",
"properties": {
"name": { "type": "string", "nullable": true },
"url": { "type": "string", "nullable": true }
},
"required": [],
"additionalProperties": true,
"nullable": true,
"example": { "name": "Example Author", "url": "https://example.com/author" }
}
Enum field:
"contentType": {
"type": "string",
"description": "Type of content",
"enum": ["article", "video", "image"],
"nullable": true,
"example": "article"
}
Union type (e.g., TypeScript ObjectType | string):
"metadata": {
"type": ["object", "string"],
"description": "Structured metadata object, or error string if unavailable",
"nullable": true,
"example": { "key": "value" }
}
Use realistic but generic values. Follow platform ID format conventions:
| Field type | Example approach |
|---|---|
| IDs | Match platform format and length (e.g., 11 chars for YouTube video IDs) |
| Usernames | "exampleuser", "sampleuser123" |
| Display names | "Example Channel", "Sample Author" |
| URLs | Use platform's standard URL format with fake IDs |
| Dates | "2025-01-15T12:00:00.000Z" (ISO 8601) |
| Text content | Generic descriptive text, e.g., "This is an example description." |
transformation.fields: List 8–12 most important field names (order = column order in UI)display.properties: One entry per overview field with label and format"text", "number", "date", "link", "boolean", "image", "array", "object"Pick fields that give users the most useful at-a-glance summary of the data.
key_value_store_schema.json (if applicable)Goal: Define key-value store collections if the Actor stores data in the key-value store
Skip this phase if no
Actor.setValue()/Actor.set_value()calls were found in Phase 1 (beyond the defaultINPUTkey).
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "<Descriptive title — what the key-value store contains>",
"description": "<One sentence describing the stored data>",
"collections": {
"<collectionName>": {
"title": "<Human-readable title>",
"description": "<What this collection contains>",
"keyPrefix": "<prefix->"
}
}
}
Group the discovered setValue / set_value calls by key pattern:
"RESULTS", "summary") — use "key" (exact match)"screenshot-${id}", f"image-{name}") — use "keyPrefix"Each group becomes a collection.
| Property | Required | Description |
|---|---|---|
title | Yes | Shown in UI tabs |
description | No | Shown in UI tooltips |
key | Conditional | Exact key for single-key collections (use key OR keyPrefix, not both) |
keyPrefix | Conditional | Prefix for multi-key collections (use key OR keyPrefix, not both) |
contentTypes | No | Restrict allowed MIME types (e.g., ["image/jpeg"], ["application/json"]) |
jsonSchema | No | JSON Schema draft-07 for validating application/json content |
Single file output (e.g., a report):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Analysis Results",
"description": "Key-value store containing analysis output",
"collections": {
"report": {
"title": "Report",
"description": "Final analysis report",
"key": "REPORT",
"contentTypes": ["application/json"]
}
}
}
Multiple files with prefix (e.g., screenshots):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Scraped Files",
"description": "Key-value store containing downloaded files and screenshots",
"collections": {
"screenshots": {
"title": "Screenshots",
"description": "Page screenshots captured during scraping",
"keyPrefix": "screenshot-",
"contentTypes": ["image/png", "image/jpeg"]
},
"documents": {
"title": "Documents",
"description": "Downloaded document files",
"keyPrefix": "doc-",
"contentTypes": ["application/pdf", "text/html"]
}
}
}
output_schema.jsonGoal: Create the output schema that tells Apify Console where to find results
For most Actors that push data to a dataset, this is a minimal file:
{
"actorOutputSchemaVersion": 1,
"title": "<Descriptive title — what the Actor returns>",
"description": "<One sentence describing the output data>",
"properties": {
"dataset": {
"type": "string",
"title": "Results",
"description": "Dataset containing all scraped data",
"template": "{{links.apiDefaultDatasetUrl}}/items"
}
}
}
Critical: Each property entry must include
"type": "string"— this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects"type": "object"— only"string"is valid here).
If key_value_store_schema.json was generated in Phase 3, add a second property:
"files": {
"type": "string",
"title": "Files",
"description": "Key-value store containing downloaded files",
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}
{{links.apiDefaultDatasetUrl}} — API URL of default dataset{{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store{{links.publicRunUrl}} — Public run URL{{links.consoleRunUrl}} — Console run URL{{links.apiRunUrl}} — API run URL{{links.containerRunUrl}} — URL of webserver running inside the run{{run.defaultDatasetId}} — ID of the default dataset{{run.defaultKeyValueStoreId}} — ID of the default key-value storeactor.jsonGoal: Wire the schema files into the Actor configuration
Actions:
actor.jsonstorages.dataset reference:
"storages": {
"dataset": "./dataset_schema.json"
}
key_value_store_schema.json was generated, add the reference:
"storages": {
"dataset": "./dataset_schema.json",
"keyValueStore": "./key_value_store_schema.json"
}
output reference:
"output": "./output_schema.json"
actor.json had inline storages.dataset or storages.keyValueStore objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path stringsGoal: Ensure correctness and completeness
Checklist:
dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce"nullable": truefields object has both "additionalProperties": true and "required": []properties also has "additionalProperties": true and "required": []"description" and an "example""type" is present on every field that has "nullable"output_schema.json has "type": "string" on every propertykey_value_store_schema.json has collections matching all setValue/set_value callskey or keyPrefix (not both)actor.json references all generated schema filesPresent the generated schemas to the user for review before writing them.
Goal: Document what was created
Report:
apify run, verify output tab in Console)