Collect structured visual feedback from humans using AnyCap's annotation tool, or create and iterate on diagrams using the interactive whiteboard (Excalidraw). Covers image annotation, URL/web page review with screen recording, video review, audio feedback, and collaborative diagramming with Mermaid input. Use when you need a human to point at things, mark regions, draw on screenshots, review a web page or UI, narrate feedback over a recording, provide any spatially-grounded visual input, create or iterate on architecture diagrams, flowcharts, or wireframes. Also use when you need to present work-in-progress to a human for approval or revision. Trigger on: get feedback, show to user, review UI, annotate, mark up, visual feedback, screen recording, user review, human-in-the-loop, approval flow, interactive review, whiteboard, diagram, draw, flowchart, wireframe, or architecture chart.
Read this entire file before starting. Skipping sections leads to incorrect workflows -- each media type has different capabilities and constraints.
Workflow guide for collecting structured visual feedback from humans using AnyCap's annotation tool. This skill teaches you when and how to involve humans in your workflow through visual annotation and screen recording.
For CLI command reference, read the anycap-cli skill. For media generation workflows, read the anycap-media-production skill.
AnyCap CLI must be installed and authenticated. Read the anycap-cli skill if setup is needed.
AnyCap annotation excels at two distinct review workflows. Choose the one that fits your situation:
| Scenario | Best for | Primary artifact | Highlight |
|---|
| URL / Web Page Review | Web pages, local dev servers, live UIs | Screen recording with narration | Browse, annotate, and narrate -- the recording captures everything |
| Image Collaborative Review | Generated images, screenshots, designs | Annotated image with merged feedback | Multiple reviewers annotate simultaneously in real-time |
Both scenarios support all annotation tools (Rect, Arrow, Point, Freehand) and text labels. The difference is in what you get back and how the review is conducted.
# Blocking -- opens browser, waits for Done click, outputs result
anycap annotate <target> [-o output.png]
# Non-blocking -- starts background server, returns session info
anycap annotate <target> --no-wait [-o output.png]
# Poll for result after human confirms done
anycap annotate poll --session <session_id>
# Stop background server
anycap annotate stop --session <session_id>
# List all active sessions (useful for recovery after context loss)
anycap annotate list
<target> is auto-detected by content: image file, URL (http:// or https://), video file, or audio file.
| Flag | Description |
|---|---|
--no-wait | Non-blocking mode (recommended for agents) |
-o, --output | Save annotated image to this path |
--port <port> | Bind to a fixed port (default: random) |
--bind <addr> | Bind address (default: 127.0.0.1) |
Tip: Use --port with a consistent value (e.g., --port 8888) across sessions. The browser stores each user's display name in localStorage, which is scoped by origin (host + port). A fixed port means returning collaborators are recognized automatically without re-entering their name.
Both blocking and non-blocking modes automatically attempt to open the annotation URL in the default browser. In headless environments (SSH, container), the CLI prints the URL to stderr instead. No error is raised.
When presenting an annotation session to the human, adapt your message based on context. Key points to communicate:
--bind 0.0.0.0, share the URL with the actual host IP. If behind SSH, suggest port forwarding.Do NOT use canned messages. Compose naturally based on the situation (what you just generated/modified, whether it is desktop or headless, single or multi-reviewer).
When running in a headless environment (SSH, container, cloud VM), the human cannot access 127.0.0.1 directly. Use --bind and --port to make the annotation server accessible:
# Bind to all interfaces on a fixed port
anycap annotate screenshot.png --no-wait --bind 0.0.0.0 --port 8888
The human can then access the annotation UI via:
http://<server-ip>:8888 (if the port is exposed)ssh -L 8888:localhost:8888 user@host, then open http://localhost:8888docker run -p 8888:8888 ..., then open http://localhost:8888Always use --port with a fixed number in headless environments so the URL is predictable and forwardable.
The annotation UI works behind reverse proxies with arbitrary path prefixes. All asset, API, and WebSocket URLs are resolved relative to the page URL, so setups like the following work out of the box: