Learn a new app's UI — detect all components, identify, filter, save to visual memory. Run before operating any app not yet in memory.
Learning detects and saves all UI components for future template matching. Run this whenever:
memory/apps/<appname>/ → full learnlearn --page <pagename>python3 scripts/agent.py learn --app AppName
1. Activate app, ensure window ≥ 800x600
2. agent.py runs learn:
a. Takes full-screen screenshot, crops window region
b. Runs Salesforce/GPA-GUI-Detector → icons, buttons, UI elements
c. Merges results with IoU dedup
d. Crops each element → saves to memory/apps/<appname>/components/
e. Reports unlabeled icons
3. YOU identify all components:
a. Use `image` tool to view each cropped image (**one at a time** for accuracy)
b. For each: read text, describe icon, determine actual name
c. Only label GENERIC UI components (buttons, icons, tabs, nav)
d. DELETE temporary/dynamic content (to prevent storage bloat)
e. Verify _find_nearest_text names (often wrong in dense UIs)
f. Rename: app_memory.py rename --old X --new Y
4. After identification + task complete:
a. Run: agent.py cleanup --app AppName
b. Remove dynamic content (timestamps, message previews)
c. Keep all stable UI elements (no privacy filtering needed — data stays local)
5. Result: ~20-30 named, fixed UI components per page
Important: Components are cropped from full-screen screenshots so they match
perfectly when doing full-screen template matching later. This is why capture_window
uses full-screen screenshot + crop (via gui_action.py screenshot).
_find_nearest_text is a hint, not truth — always verify by viewing the cropped image.
Only save stable UI elements — things that look the same next session:
SAVE (stable):
SKIP (dynamic):
Naming:
Search.png, Settings.png)unlabeled_<region>_<x>_<y>.pngGolden rule: only save things that look the same next time you open the app.
KEEP: sidebar nav icons, toolbar buttons, input controls, window controls, tab headers, fixed logos
REMOVE: chat messages, timestamps, user avatars in lists, notification badges, contact names, web content, text >15 chars in content area, profile pictures
Quick test: "Same place, same appearance tomorrow?" → KEEP. Otherwise → REMOVE.
unlabeled_ files remainTask arrives → ensure_app_ready(app, workflow)
│
├── Never learned? → full learn
├── Known app, new page? → learn --page <name>
└── Known app, known page → template match:
├── ≥ 80% → proceed
└── < 80% → incremental learn
For memory rules, naming, dedup, privacy, and browser per-site memory → see skills/gui-memory/SKILL.md.