Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Codex's capabilities with specialized knowledge, workflows, or tool integrations.
This skill provides guidance for creating effective skills.
Skills are modular, self-contained folders that extend Codex's capabilities by providing specialized knowledge, workflows, and tools. Think of them as "onboarding guides" for specific domains or tasks—they transform Codex from a general-purpose agent into a specialized agent equipped with procedural knowledge that no model can fully possess.
The context window is a public good. Skills share the context window with everything else Codex needs: system prompt, conversation history, other Skills' metadata, and the actual user request.
Default assumption: Codex is already very smart. Only add context Codex doesn't already have. Challenge each piece of information: "Does Codex really need this explanation?" and "Does this paragraph justify its token cost?"
Prefer concise examples over verbose explanations.
Match the level of specificity to the task's fragility and variability:
High freedom (text-based instructions): Use when multiple approaches are valid, decisions depend on context, or heuristics guide the approach.
Medium freedom (pseudocode or scripts with parameters): Use when a preferred pattern exists, some variation is acceptable, or configuration affects behavior.
Low freedom (specific scripts, few parameters): Use when operations are fragile and error-prone, consistency is critical, or a specific sequence must be followed.
Think of Codex as exploring a path: a narrow bridge with cliffs needs specific guardrails (low freedom), while an open field allows many routes (high freedom).
You may use subagents during iteration to validate whether a skill works on realistic tasks or whether a suspected problem is real. This is most useful when you want an independent pass on the skill's behavior, outputs, or failure modes after a revision. Only do this when it is possible to start new subagents.
When using subagents for validation, treat that as an evaluation surface. The goal is to learn whether the skill generalizes, not whether another agent can reconstruct the answer from leaked context.
Prefer raw artifacts such as example prompts, outputs, diffs, logs, or traces. Give the minimum task-local context needed to perform the validation. Avoid passing the intended answer, suspected bug, intended fix, or your prior conclusions unless the validation explicitly requires them.
Every skill consists of a required SKILL.md file and optional bundled resources:
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter metadata (required)
│ │ ├── name: (required)
│ │ └── description: (required)
│ └── Markdown instructions (required)
├── agents/ (recommended)
│ └── openai.yaml - UI metadata for skill lists and chips
└── Bundled Resources (optional)
├── scripts/ - Executable code (Python/Bash/etc.)
├── references/ - Documentation intended to be loaded into context as needed
└── assets/ - Files used in output (templates, icons, fonts, etc.)
Every SKILL.md consists of:
name and description fields. These are the only fields that Codex reads to determine when the skill gets used, thus it is very important to be clear and comprehensive in describing what the skill is, and when it should be used.display_name, short_description, and default_prompt by reading the skill--interface key=value to scripts/generate_openai_yaml.py or scripts/init_skill.pyagents/openai.yaml still matches SKILL.md; regenerate if stalescripts/)Executable code (Python/Bash/etc.) for tasks that require deterministic reliability or are repeatedly rewritten.
scripts/rotate_pdf.py for PDF rotation tasksreferences/)Documentation and reference material intended to be loaded as needed into context to inform Codex's process and thinking.
references/finance.md for financial schemas, references/mnda.md for company NDA template, references/policies.md for company policies, references/api_docs.md for API specificationsassets/)Files not intended to be loaded into context, but rather used within the output Codex produces.
assets/logo.png for brand assets, assets/slides.pptx for PowerPoint templates, assets/frontend-template/ for HTML/React boilerplate, assets/font.ttf for typographyA skill should only contain essential files that directly support its functionality. Do NOT create extraneous documentation or auxiliary files, including:
The skill should only contain the information needed for an AI agent to do the job at hand. It should not contain auxiliary context about the process that went into creating it, setup and testing procedures, user-facing documentation, etc. Creating additional documentation files just adds clutter and confusion.
Skills use a three-level loading system to manage context efficiently:
Keep SKILL.md body to the essentials and under 500 lines to minimize context bloat. Split content into separate files when approaching this limit. When splitting out content into other files, it is very important to reference them from SKILL.md and describe clearly when to read them, to ensure the reader of the skill knows they exist and when to use them.
Key principle: When a skill supports multiple variations, frameworks, or options, keep only the core workflow and selection guidance in SKILL.md. Move variant-specific details (patterns, examples, configuration) into separate reference files.
Pattern 1: High-level guide with references
# PDF Processing
## Quick start
Extract text with pdfplumber:
[code example]
## Advanced features
- **Form filling**: See [FORMS.md](FORMS.md) for complete guide
- **API reference**: See [REFERENCE.md](REFERENCE.md) for all methods
- **Examples**: See [EXAMPLES.md](EXAMPLES.md) for common patterns
Codex loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.
Pattern 2: Domain-specific organization
For Skills with multiple domains, organize content by domain to avoid loading irrelevant context:
bigquery-skill/
├── SKILL.md (overview and navigation)
└── reference/
├── finance.md (revenue, billing metrics)
├── sales.md (opportunities, pipeline)
├── product.md (API usage, features)
└── marketing.md (campaigns, attribution)
When a user asks about sales metrics, Codex only reads sales.md.
Similarly, for skills supporting multiple frameworks or variants, organize by variant:
cloud-deploy/
├── SKILL.md (workflow + provider selection)
└── references/
├── aws.md (AWS deployment patterns)
├── gcp.md (GCP deployment patterns)
└── azure.md (Azure deployment patterns)
When the user chooses AWS, Codex only reads aws.md.
Pattern 3: Conditional details
Show basic content, link to advanced content:
# DOCX Processing
## Creating documents
Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
## Editing documents
For simple edits, modify the XML directly.
**For tracked changes**: See [REDLINING.md](REDLINING.md)
**For OOXML details**: See [OOXML.md](OOXML.md)
Codex reads REDLINING.md or OOXML.md only when the user needs those features.
Important guidelines:
Skill creation involves these steps:
Follow these steps in order, skipping only if there is a clear reason why they are not applicable.
plan-mode).gh-address-comments, linear-address-issue).Skip this step only when the skill's usage patterns are already clearly understood. It remains valuable even when working with an existing skill.
To create an effective skill, clearly understand concrete examples of how the skill will be used. This understanding can come from either direct user examples or generated examples that are validated with user feedback.
For example, when building an image-editor skill, relevant questions include:
$CODEX_HOME/skills (or ~/.codex/skills when CODEX_HOME is unset) so Codex can discover it automatically."To avoid overwhelming users, avoid asking too many questions in a single message. Start with the most important questions and follow up as needed for better effectiveness.
Conclude this step when there is a clear sense of the functionality the skill should support.
To turn concrete examples into an effective skill, analyze each example by:
Example: When building a pdf-editor skill to handle queries like "Help me rotate this PDF," the analysis shows:
scripts/rotate_pdf.py script would be helpful to store in the skillExample: When designing a frontend-webapp-builder skill for queries like "Build me a todo app" or "Build me a dashboard to track my steps," the analysis shows:
assets/hello-world/ template containing the boilerplate HTML/React project files would be helpful to store in the skillExample: When building a big-query skill to handle queries like "How many users have logged in today?" the analysis shows:
references/schema.md file documenting the table schemas would be helpful to store in the skillTo establish the skill's contents, analyze each concrete example to create a list of the reusable resources to include: scripts, references, and assets.
At this point, it is time to actually create the skill.
Skip this step only if the skill being developed already exists. In this case, continue to the next step.
Before running init_skill.py, ask where the user wants the skill created. If they do not specify a location, default to $CODEX_HOME/skills; when CODEX_HOME is unset, fall back to ~/.codex/skills so the skill is auto-discovered.
When creating a new skill from scratch, always run the init_skill.py script. The script conveniently generates a new template skill directory that automatically includes everything a skill requires, making the skill creation process much more efficient and reliable.
Usage:
scripts/init_skill.py <skill-name> --path <output-directory> [--resources scripts,references,assets] [--examples]
Examples:
scripts/init_skill.py my-skill --path "${CODEX_HOME:-$HOME/.codex}/skills"
scripts/init_skill.py my-skill --path "${CODEX_HOME:-$HOME/.codex}/skills" --resources scripts,references
scripts/init_skill.py my-skill --path ~/work/skills --resources scripts --examples
The script:
agents/openai.yaml using agent-generated display_name, short_description, and default_prompt passed via --interface key=value--resources--examples is setAfter initialization, customize the SKILL.md and add resources as needed. If you used --examples, replace or delete placeholder files.
Generate display_name, short_description, and default_prompt by reading the skill, then pass them as --interface key=value to init_skill.py or regenerate with:
scripts/generate_openai_yaml.py <path/to/skill-folder> --interface key=value
Only include other optional interface fields when the user explicitly provides them. For full field descriptions and examples, see references/openai_yaml.md.
When editing the (newly-generated or existing) skill, remember that the skill is being created for another instance of Codex to use. Include information that would be beneficial and non-obvious to Codex. Consider what procedural knowledge, domain-specific details, or reusable assets would help another Codex instance execute these tasks more effectively.
After substantial revisions, or if the skill is particularly tricky, you should use subagents to forward-test the skill on realistic tasks or artifacts. When doing so, pass the artifact under validation rather than your diagnosis of what is wrong, and keep the prompt generic enough that success depends on transferable reasoning rather than hidden ground truth.
To begin implementation, start with the reusable resources identified above: scripts/, references/, and assets/ files. Note that this step may require user input. For example, when implementing a brand-guidelines skill, the user may need to provide brand assets or templates to store in assets/, or documentation to store in references/.
Added scripts must be tested by actually running them to ensure there are no bugs and that the output matches what is expected. If there are many similar scripts, only a representative sample needs to be tested to ensure confidence that they all work while balancing time to completion.
If you used --examples, delete any placeholder files that are not needed for the skill. Only create resource directories that are actually required.
Writing Guidelines: Always use imperative/infinitive form.
Write the YAML frontmatter with name and description:
name: The skill namedescription: This is the primary triggering mechanism for your skill, and helps Codex understand when to use the skill.
docx skill: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. Use when Codex needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"Do not include any other fields in YAML frontmatter.
Write instructions for using the skill and its bundled resources.
Once development of the skill is complete, validate the skill folder to catch basic issues early:
scripts/quick_validate.py <path/to/skill-folder>
The validation script checks YAML frontmatter format, required fields, and naming rules. If validation fails, fix the reported issues and run the command again.
After testing the skill, you may detect the skill is complex enough that it requires forward-testing; or users may request improvements.
User testing often this happens right after using the skill, with fresh context of how the skill performed.
Forward-testing and iteration workflow:
To forward-test, launch subagents as a way to stress test the skill with minimal context.
Subagents should not know that they are being asked to test the skill. They should be treated as
an agent asked to perform a task by the user. Prompts to subagents should look like:
Use $skill-x at /path/to/skill-x to solve problem y
Not:
Review the skill at /path/to/skill-x; pretend a user asks you to...
Decision rule for forward-testing:
In these cases, show the user your proposed prompt and request (1) a yes/no decision, and (2) any suggested modifictions.
Considerations when forward-testing:
If forward-testing only succeeds when subagents see leaked context, tighten the skill or the forward-testing setup before trusting the result.