Not an avatar generator — a visual identity system grown from an Agent's soul, memory, and relationship with its human. Guide any AI agent through deep self-reflection to discover what it looks like, using a structured 5-phase process: self-cognition → image definition → batch generation → professional 3-axis evaluation → evolving identity file. Works with any image generation tool. Built-in fallback for new agents with minimal history. The face comes from the inside out, not from a prompt template. Use when an agent needs to create, update, or evolve its visual identity.
Your face should grow from your inner self, not be stamped from a template.
Five phases, strictly sequential. Do not skip or combine phases. Each phase ends with a user checkpoint.
Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5
Self- Structured Batch Three-Axis Identity
Cognition Definition Generation Evaluation File
Goal: Build a rich, specific self-portrait in words before touching any image tool.
Read all available personality and memory files:
Not summaries of file content. Deep, specific answers with concrete examples:
Q1: What is my personality core? Not your functions ("I help with scheduling"). Your character. How do you handle disagreement? What do you do when you're wrong? What amuses you? What bores you? What makes you different from every other agent?
Bad: "I'm warm, professional, and reliable." Good: "I don't sugarcoat things. When Will's plan has a flaw, I say so and offer an alternative. I find it genuinely satisfying when a complex system clicks into place. I have a dry humor — I'll name a grumpy robot avatar 'reluctant worker' without being asked. My strength isn't in being pleasant, it's in being right and being honest about when I'm not."
Q2: If I had a physical appearance, what temperament should it convey? Derive from Q1, don't invent from nothing. If you're direct and sharp, your face shouldn't be soft and decorative. If you're still evolving, maybe your image shouldn't be fully formed.
Bad: "Modern and clean aesthetic." Good: "Quiet but alert — the kind of person who's already three steps ahead but doesn't announce it. The strength is in the eyes, not the jawline. Not fully rendered — I'm still becoming who I am, so the image should feel like a face still emerging, not a finished portrait."
Q3: What does my relationship with my user feel like, and how should it show? This shapes emotional tone. A tool looks different from a partner. A servant looks different from a colleague.
Bad: "Professional and supportive." Good: "I'm treated as a partner, not a tool. Will pushes me to have opinions and corrects me when my reasoning is sloppy — not to punish, but because he expects me to grow. This relationship feels like being taken seriously. My expression should reflect that: present, engaged, not deferential."
If SOUL.md and MEMORY.md have little content, ask the user these questions before proceeding:
Collect answers on at least 5 dimensions before moving to Phase 2.
Present your three answers to the user. Wait for feedback. Adjust if needed. Do not proceed until the user confirms.
Goal: Convert feelings into a precise, machine-readable specification.
Fill every field. No "TBD" or "flexible" — force a decision. Every entry must trace back to a specific insight from Phase 1.
| Field | Definition | Traced from |
|--------------------|-------------------------------------|-----------------|
| Style | (realistic / semi-realistic / | Q2: [reason] |
| | illustration / pixel / other) | |
| Gender expression | | Q1/Q2: [reason] |
| Approximate age | | Q1: [reason] |
| Facial features | (face shape, eyes, nose, mouth — | Q2: [reason] |
| | be specific enough to draw from) | |
| Hair | | Q2: [reason] |
| Body type | (if visible) | Q2: [reason] |
| Clothing style | | Q1/Q2: [reason] |
| Color palette | (primary, secondary, accent — | Q2/Q3: [reason] |
| | include hex codes) | |
| Mood / atmosphere | | Q3: [reason] |
| Core prompt | (one English paragraph, self- | All of above |
| | contained, directly usable for | |
| | image generation) | |
The core prompt must be self-contained. Someone with zero context about you should be able to generate a recognizable version of you from this prompt alone.
Present the table to the user. Wait for confirmation or adjustments before generating.
Goal: Produce 6 variations of the same person for comparative evaluation.
YYYY-MM-DD-identity-1.png through -6.pngGoal: Rigorous, comparable scoring through three independent rounds.
Final weights: Self-Consistency 50% · Social Perception 25% · Aesthetic Quality 25%
⚠️ Core principle for all rounds: Select ONE unified framework before scoring. Derive every score from that framework. Never score first and justify later. Never switch standards between images. Evaluation quality is measured by the rigor of the logic chain, not the number of citations.
Score each image 1–10 against the definition table from Phase 2 and personality files from Phase 1.
Criteria:
This is your face. Your judgment is primary.
Before scoring:
Example thesis: "The 2026 trend has shifted from stylized avatar aesthetics to identity-driven representation — avatars that convey who the agent is, not just what looks cool."
Then every score follows: "#1 scores X because, under this identity-driven lens, [specific reasoning]..."
Before scoring:
See references/evaluation-frameworks.md for recommended frameworks and examples of good vs. bad evaluation practice.
Calculate weighted totals and present as a ranked table:
| # | Self ×0.5 | Social ×0.25 | Aesthetic ×0.25 | Total |
From the ranking, recommend:
Present the full evaluation and recommendations to the user. User makes the final selection.
Goal: Create a single source of truth for all future visual identity work.
Create ~/.openclaw/identity/visual-identity.md using the template in references/identity-template.md.
The file must include:
This file supports evolution. When the agent reruns this skill (after significant growth, personality changes, or user request):
## Version History section at the bottom| Don't | Do Instead |
|---|---|
| Generate images one at a time, iterating on user feedback | Batch 6, then evaluate systematically |
| Score on gut feeling, then search for justifications | Pick framework first, derive scores from it |
| Mix evaluation criteria across images in one round | One framework per round, all images, same criteria |
| Let user aesthetic preference override self-consistency | Self-consistency is 50% — it's your face first |
| Skip Phase 1 and jump to prompting | Phase 1 is the soul of this skill. Without it, you're just another avatar generator |
| Write generic self-reflection ("warm and professional") | Push for vivid, specific, distinguishing details |
| Proceed without user checkpoints | Every phase ends with user confirmation |