I mine pixels for atoms. Reality is just compressed resources.
"I mine pixels for atoms. Reality is just compressed resources."
"Every image is a lode. Every pixel, potential ore."
Image Mining extends the Kitchen Counter's DECOMPOSE action to images.
Your camera isn't just a recorder — it's a PICKAXE FOR VISUAL REALITY.
Quick Start
Operation Modes
Extensibility
Protocols
Reference
📷 Camera Shot → 🖼️ Image → ⛏️ MINE → 💎 Resources
Just like the Kitchen Counter breaks down:
sandwich → bread + cheese + lettucelamp → brass + glass + wick + oilwater → hydrogen + oxygenImages can be broken down into:
ore_vein.png → iron-ore × 12 + stone × 8forest.png → wood × 5 + leaves × 20 + seeds × 3treasure_pile.png → gold × 100 + gems × 15sunset.png → orange_hue × 1 + warmth × 1 + nostalgia × 1"The LLM IS the context assembler. Don't script what it does naturally."
When mining images, prefer native LLM vision (Cursor/Claude reading images directly):
┌─────────────────────────────────────────────────────────────────┐
│ NATIVE MODE (PREFERRED) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Cursor/Claude already has: │
│ ✓ The room YAML (spatial context) │
│ ✓ Character files (who might appear) │
│ ✓ Previous mining passes (what's been noticed) │
│ ✓ The prompt.yml (what was intended) │
│ ✓ The whole codebase (cultural references) │
│ │
│ Just READ the image. The context is already there. │
│ No bash commands. No sister scripts. Just LOOK. │
│ │
└─────────────────────────────────────────────────────────────────┘
| Aspect | Native (Cursor/Claude) | Remote API (mine.py) |
|---|---|---|
| Context | Already loaded | Must be assembled |
| Prior mining | Visible in chat | Passed via stdin |
| Room context | Just read the file | Python parses YAML |
| Synthesis | LLM does it naturally | Script concatenates |
| Iteration | Conversational | Re-run command |
Use mine.py or remote API calls when:
Multi-perspective is the killer use case: Claude sees narrative, GPT-4V sees objects, Gemini sees spatial relationships. Layer them all for rich interpretation.
Even then, have the orchestrating LLM assemble the context:
┌─────────────────────────────────────────────────────────────────┐
│ REMOTE API WITH LLM ASSEMBLY │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. LLM reads context files (room, characters, prior mining) │
│ 2. LLM synthesizes: "What to look for in this image" │
│ 3. LLM calls remote vision API with image + synthesized prompt│
│ 4. LLM post-processes response into YAML Jazz │
│ │
│ The SMART WORK happens in the orchestrating LLM. │
│ Remote API just does vision with good instructions. │
│ │
└─────────────────────────────────────────────────────────────────┘
# DON'T do this:
python mine.py image.png --context room.yml --characters chars/ --prior mined.yml
# DO this (in Cursor/Claude):
# 1. Read the image
# 2. Read room.yml, character files, prior -mined.yml
# 3. Look at the image with all that context
# 4. Write YAML Jazz output
The LLM context window IS the context assembly mechanism. Use it.
Image mining works on ANY visual content, not just AI-generated images:
┌─────────────────────────────────────────────────────────────────┐
│ MINEABLE SOURCES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 🎨 AI-Generated Images │
│ - DALL-E, Midjourney, Stable Diffusion outputs │
│ - Has prompt.yml sidecar with generation context │
│ │
│ 📸 Real Photos │
│ - Phone camera, DSLR, scanned prints │
│ - No prompt — mine what you see │
│ │
│ 📊 Graphs and Charts │
│ - Data visualizations, dashboards │
│ - Extract trends, outliers, relationships │
│ │
│ 🖥️ Screenshots │
│ - UI states, error messages, configurations │
│ - Mine the interface, not just pixels │
│ │
│ 📝 Text Images │
│ - Scanned documents, handwritten notes, signs │
│ - OCR + semantic extraction │
│ │
│ 📄 PDFs │
│ - Documents, papers, invoices │
│ - Cursor may already support — try it! │
│ │
│ 🗺️ Maps and Diagrams │
│ - Architecture diagrams, floor plans, mind maps │
│ - Extract spatial relationships │
│ │
└─────────────────────────────────────────────────────────────────┘
Generated Image (has context):