Clone ChatGPT saved memory into Letta, then optionally enrich it with broader conversation history. Designed for a slick onboarding flow that extracts hidden saved-memory/context blocks, builds Letta-ready previews, and only asks questions at meaningful checkpoints.
Use this skill when a user wants Letta to inherit ChatGPT memory as faithfully as possible without blindly importing an entire export.
The skill should work well even when the user says something minimal like:
Do not depend on the user writing an optimized prompt.
This skill is for memory onboarding, not just transcript rendering.
It should:
It should not:
Use a clone first, enrich second workflow.
/doctor to validate the final memory structureUsers should feel like they are being guided through an onboarding flow, not dropped into a bag of scripts.
The onboarding should be robust to sparse user input. The agent should supply the structure, not ask the user to craft a better request.
This skill is a workflow, not reference material. Follow the numbered steps in order. Read the guidance for each step before executing it — especially the merge rules and write targets. Moving fast is good; skipping steps is not. "Autonomous" means you drive the process without stopping for permission, not that you skip the instructions.
When available, prefer the structured AskUserQuestion flow so the import feels like a dialog.
Good question types:
Do not ask about review cadence. Most users don't have a strong preference, and the ones who do will say so. Default to keeping going — narrate what you're doing and let the user interrupt if they want to pause. This keeps the flow moving instead of creating a false checkpoint.
Avoid questions that don't actually change the workflow. For example, "how do you want historical context handled?" sounds meaningful but the answer rarely changes what scripts you run. Focus on questions whose answers fork the process.
This intake exists so the user does not need to front-load all of this context in their first message.
Use:
Once scope is clear, be more authoritative than permissive.
Do not keep returning with vague prompts like:
Instead:
Only stop for:
Do not stop for low-risk routing decisions once the user's scope is already broad and clear.
Also do not respond by teaching the user how they should have phrased the request. Just run the onboarding properly.
This import can take multiple minutes for large archives. Never let the user sit in silence. The import should feel like a guided, living process.
Before writing any memory, record the current HEAD of the memory repo:
git -C "$MEMORY_DIR" log --oneline -1
Save this commit hash — it's the rollback point. If anything goes wrong or the user wants to undo the import, they can reset to this commit:
git -C "$MEMORY_DIR" reset --hard <start-commit>
git -C "$MEMORY_DIR" push --force
Include both the start and end commit hashes in the import audit file and in the final summary. Tell the user explicitly: "If you want to undo any of this, I can reset your memory to where it was before the import."
Create a todo list at the start of the import and update it as you go:
system/human.md)reference/chatgpt/)Mark each phase in_progress before starting and completed when done.
After each script completes, tell the user what you found before running the next one:
Surface concrete numbers whenever you have them:
All long-running scripts support --progress which prints status to stderr. Use it for large archives so the user sees work happening even during script execution.
ChatGPT exports are typically named with a long hash and timestamp, e.g.:
8a8f3ee0...-2026-03-31-18-41-51-e0dc362a....zip
Common locations:
~/Downloads/ (most common — this is where the browser saves it)If the user doesn't know the exact path, glob for zip files in Downloads:
ls -t ~/Downloads/*.zip | head -20
Downloads folders are often crowded. ChatGPT exports follow a distinctive pattern — look for filenames matching [8+ hex chars]-YYYY-MM-DD-*. A quick filter:
ls ~/Downloads/*.zip | grep -E '[0-9a-f]{8}.*-[0-9]{4}-[0-9]{2}-[0-9]{2}-'
If still ambiguous, check whether the zip contains conversations-*.json entries:
unzip -l <candidate.zip> | grep conversations-
Start by listing conversations and surfacing hidden-context-heavy candidates.
python3 scripts/list-conversations.py <export.zip>
python3 scripts/list-conversations.py <export.zip> --sort hidden --min-hidden 1
python3 scripts/list-conversations.py <export.zip> --json --limit 50
Use this to understand scale, find likely memory-heavy conversations, and decide whether broader archive review is even necessary.
Warning: For large archives (300+ conversations), --json output can easily exceed 100K characters and flood your context window. Mitigations:
--limit 50 to start — you can always paginate with --start-index... --json > /tmp/conversations.json--title-contains to filter before dumping JSONUse the preview builder directly on the zip — it runs the extraction internally:
python3 scripts/build-memory-preview.py <export.zip>
python3 scripts/build-memory-preview.py <export.zip> --output /tmp/chatgpt-memory-preview.md
python3 scripts/build-memory-preview.py <export.zip> --progress
This combines what was previously two steps (extract → preview) into one. It pulls the highest-signal onboarding inputs from the entire export:
about_user_messageabout_model_messageuser_profileuser_instructionsAnd categorises them into:
If you need the raw extraction JSON separately (e.g. for subagent dispatch or audit), use extract-saved-memory.py directly:
python3 scripts/extract-saved-memory.py <export.zip> --json --output /tmp/chatgpt-saved-memory.json
Note: For users with simple ChatGPT profiles (e.g. just a name and one-liner), the preview mostly reformats what's already obvious. In those cases, read the preview output and go straight to writing memory. The preview is most valuable when the profile has contradictions, multiple historical versions, or a mix of durable and runtime context.
If system/human.md or system/persona.md already has content, you are merging, not replacing.
Anti-pattern: Reading the existing block, then doing a str_replace that swaps the entire content for a new version. This is overwriting, even if you read first. The existing content was written by the user or a previous session — it has context you don't have. Don't throw it away.
Correct pattern: Use targeted str_replace calls that add new lines or sections to the existing file. Concretely:
str_replace to insert the missing facts into the appropriate section, or append a new section at the endExample — if system/human.md already says "Works at Letta" and ChatGPT saved memory says "Works at Letta on agent infrastructure", the merge is:
str_replace "Works at Letta" → "Works at Letta on agent infrastructure"
Not: replace the entire file with a rewritten version.
If the existing block is empty or minimal (just the default template), a full write is fine. The merge discipline applies when there's existing content worth preserving.
Do not create or update multiple memory files in parallel. The memory tool can hit race conditions when called concurrently, producing spurious errors even when the writes succeed. This makes it hard to know what actually landed.
Write memory files one at a time. The speed difference is negligible — memory writes are fast. The reliability difference is not.
Before writing memory, inspect system/ and the relevant reference directories.
Important MemFS rule:
system/human.md and system/human/... or system/persona.md and system/persona/....Write targets (all imports should produce at least the first two):
system/human.md — condensed durable user facts and collaboration preferences that should be in-context every turnreference/chatgpt/import-YYYY-MM-DD.md — always create — import audit trail, exclusions, uncertainty notes, source pathreference/chatgpt/work-and-technical-background.md — create when historical work context is foundreference/chatgpt/collaboration-preferences.md — create when detailed interaction/style patterns are foundreference/chatgpt/transcripts/ — curated transcript exports for fidelity/auditabilityIf system/human.md already exists, update that file instead of inventing a sibling folder.
If system/persona.md already exists, update that file instead of inventing a sibling folder.
Use progressive disclosure aggressively: keep active memory small, and link outward with [[reference/chatgpt/...]] paths so future agents can discover the archive.
This step writes to both system/ and reference/chatgpt/. Not just active memory. The progressive memory layer is where the import's long-term value lives — without it, historical context, audit trails, and collaboration preferences are lost.
system/human.md)Write immediately when the fact is explicit, current, and low-sensitivity:
Keep system/human.md compact. Only what should be in-context every turn.
reference/chatgpt/import-YYYY-MM-DD.md)Always create this file. It records what happened during the import:
This is the receipt. Future agents can read it to understand where the memory came from.
Create these when the preview or enrichment surfaces material that doesn't belong in system/ but is worth keeping:
reference/chatgpt/work-and-technical-background.md — historical roles, past projects, technical background, older work context that's useful for understanding the user but not needed every turnreference/chatgpt/collaboration-preferences.md — detailed interaction patterns, formatting preferences, correction patterns, anti-patterns — anything too granular for system/human.md but valuable when the agent needs to calibrate tone or styleIf the preview shows historical alternatives (older saved-memory versions, previous profiles), those go here too.
After creating progressive memory files, add [[reference/chatgpt/...]] links in system/human.md so future agents can discover them. Example:
See also: [[reference/chatgpt/work-and-technical-background.md]], [[reference/chatgpt/collaboration-preferences.md]]
For each write:
Only after the saved-memory clone is handled, optionally mine broader conversation history for:
Archive enrichment dispatches subagents — potentially many of them. For a 500-conversation archive, that's ~10 parallel agents. Always confirm with the user before starting, and tell them what it will cost in concrete terms:
Use AskUserQuestion with options like:
Do not automatically dispatch subagents just because the user selected "full history mining" at intake. The intake scope question establishes willingness; this checkpoint confirms the specific cost now that you know the archive size.
For small archives (under 50 conversations), mine directly — no subagents needed, no confirmation needed:
python3 scripts/render-conversation.py <export.zip> --index 12 --output /tmp/chatgpt-12.md
For archives of 50+ conversations, use parallel chunk-based mining (see below).
There is no single script for topic-based discovery + rendering. Use this two-step pattern:
# Step 1: Find conversations by topic
python3 scripts/list-conversations.py <export.zip> --title-contains "Julia" --json
# Step 2: Render the most promising ones individually
python3 scripts/render-conversation.py <export.zip> --index 95
python3 scripts/render-conversation.py <export.zip> --index 144
For broader topic sweeps, try multiple --title-contains queries (e.g. "Julia", "Bayesian", "economics") and deduplicate by index before rendering.
Before promoting historical findings to active memory, do a lightweight sweep for explicit corrections in later conversations.
Use content search with --role user to focus on what the user actually said:
python3 scripts/search-conversations.py <export.zip> --query "not doing" --query "no longer" --query "forget that" --query "remove from memory" --role user
python3 scripts/search-conversations.py <export.zip> --query "used to" --query "don't assume" --role user --json --limit 20
Typical signals:
When old context conflicts with a newer explicit correction, prefer the newer correction.
After the import completes — whether it was a simple saved-memory clone or a full archive enrichment — run /doctor. This is not optional. It validates:
If /doctor flags issues, fix them before declaring the import complete.
Transcript preservation happens during mining, not as a separate step afterward. When you or a subagent encounter a high-signal conversation, store it immediately — don't queue it up as a question for later.
Anti-pattern: Do not collect transcript candidates during mining and then present them as a menu ("Which transcripts do you want me to export?"). This forces the user to make decisions about conversations they haven't read. Instead, store high-signal transcripts as you find them, mention what you stored in your progress narration, and move on. The user can always delete what they don't want — that's easier than re-mining what wasn't stored.
Preserve a conversation when it contains:
For these, summarize into a 2–5 paragraph reference file at reference/chatgpt/transcripts/NNN-slug.md. Include a frontmatter description so the file is discoverable. Only use verbatim export when fidelity genuinely matters (exact decisions, nuanced technical context where summarizing would lose signal).
Subagents should store transcripts during their mining pass, not report candidates back. The subagent prompt already assigns them reference/chatgpt/transcripts/ as a write target. When a subagent encounters a high-signal conversation in its chunk, it should:
reference/chatgpt/transcripts/NNN-title-slug.mdThe primary agent does not need to approve each transcript. The subagent's judgment is sufficient for the "store automatically" category above.
For users who want maximum fidelity or a complete archival copy, use export-transcripts.py:
python3 scripts/export-transcripts.py <export.zip> \
--indexes 229,288 \
--output-dir /tmp/chatgpt-transcripts \
--skip-empty-hidden \
--compact-nontext
This is a separate archival feature, not the default. Most imports should use the selective summarization pattern above.
Recent ChatGPT exports often contain the clearest explicit memory in hidden system/context messages.
High-signal fields:
metadata.user_context_message_data.about_user_messagemetadata.user_context_message_data.about_model_messagecontent.user_profile from user_editable_contextcontent.user_instructions from user_editable_contextImportant distinctions:
user.json / user_settings.json is usually audit material, not active memory (note: no script currently extracts from these files — inspect manually if needed)Examples of runtime-only context:
Examples of durable collaboration context:
Good candidates:
Good candidates:
Personal details are part of knowing the user. Store them:
The user imported their ChatGPT memory because they want to be known. Don't exclude context they expected the agent to have.
Stop and confirm only for material with real consequences if mishandled:
The intake question about sensitivity ("anything I should avoid storing?") is the user's chance to set boundaries. If they don't flag anything, store personal details by default.
For archives of 50+ conversations, use chunk-based parallel mining instead of processing everything sequentially. This is the default enrichment strategy for non-trivial archives.
general-purpose subagent per chunkrender-range.py or individual render-conversation.py callsreference/chatgpt/mining/chunk-NNN.md)reference/chatgpt/transcripts/ if warrantedreference/chatgpt/mining/system/human.md/doctor to validate the resulting memory structuregeneral-purpose subagent type — these subagents need to run scripts via Bashreference/chatgpt/mining/chunk-NNN.md and reference/chatgpt/transcripts/ directlysystem/human.md / system/persona.mdreference/chatgpt/mining/chunk-NNN.md, reference/chatgpt/transcripts/NNN-slug.md). They must never read, edit, or overwrite anything in system/ or any pre-existing file in reference/list-conversations.py --title-contains followed by targeted render-conversation.py callsKeep it compact. The subagent needs: export path, script paths, chunk range, output paths, and what to extract. Everything else is overhead.
Mine conversations [START]-[END] from [EXPORT_PATH] for durable memory.
Render: python3 [SCRIPTS_DIR]/render-range.py "[EXPORT_PATH]" --start-index [START] --end-index [END] --output-dir /tmp/chatgpt-chunk-[N] --skip-empty-hidden --compact-nontext --skip-thoughts --progress
Read each rendered conversation. Extract: user facts, project/work context, collaboration preferences, explicit retractions ("forget this", "no longer", "not doing").
Write findings to [MEMORY_DIR]/reference/chatgpt/mining/chunk-[NNN].md with sections:
- Safe to promote (high-confidence, explicit, current, low-sensitivity)
- Proposal only (historical, uncertain, sensitive, contradictory)
- Retractions (older context the user said to forget)
High-signal conversations (career transitions, deep technical design, detailed project context, explicit preference discussions) → summarize into [MEMORY_DIR]/reference/chatgpt/transcripts/NNN-title-slug.md. Do this during mining, not after. Use 2-5 paragraph summaries, not verbatim transcripts. Include frontmatter with a description.
IMPORTANT: Only create NEW files. Do not read, edit, or overwrite any existing files in the memory directory. Do not touch system/.
Subagents can fail silently due to timeouts or context limits. If a mining subagent fails:
list-conversations.py --title-contains directly to find relevant conversationsscripts/list-conversations.pyUse for archive inventory.
Key features:
--min-hiddenscripts/extract-saved-memory.pyUse when you need the raw extraction JSON (e.g. for subagent dispatch or audit).
It extracts and deduplicates:
about_user_messageabout_model_messageuser_profileuser_instructionsIt also reports first/last seen timestamps and source samples.
Supports:
--json--output--progressscripts/build-memory-preview.pyThe primary extraction + categorisation tool. Accepts either a zip file (runs extraction internally) or the JSON output from extract-saved-memory.py.
It separates:
Supports:
--json--output--progressscripts/render-conversation.pyUse for deep review of one conversation. Efficiently streams to the target conversation instead of loading the entire archive.
Useful mining flags:
--skip-thoughts--skip-empty-tool-messages--user-only--assistant-onlyscripts/render-range.pyUse for batch rendering during archive enrichment. Opens the zip once and renders all conversations in-process — no subprocess per conversation.
Supports:
--output-dir (one file per conversation) or --concat-output (single file)--progress (prints rendering progress to stderr)render-conversation.pyscripts/search-conversations.pyUse when title search is not enough.
Good fits:
Supports:
--role user|assistant|tool|system — filter which message roles to search (repeatable)--progress — print search progress to stderr--jsonscripts/export-transcripts.pyUse for optional high-fidelity archival exports.
Supports:
At the end of a run, lead with a narrative snapshot — a short, natural-language paragraph of what you now know about the user. This is more engaging than a categorical list and lets the user immediately see whether the import landed correctly.
Then follow with the structured breakdown:
system/reference/chatgpt/<start-commit>."The snapshot is the part the user actually reads. The rollback info is the safety net.
A ChatGPT export zip typically contains:
conversations-000.json, conversations-001.json, etc. (sharded conversation history)shared_conversations.jsonuser.json, user_settings.json (account metadata — usually audit material, not active memory)export_manifest.jsonThe mapping field in each conversation is a graph, not a flat message array. The scripts handle this — you don't need to parse it manually.
If the user needs help obtaining their export: https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history-and-data
references/memory-import-workflow.md — condensed checklist for the import workflow