技能档案

DocuGen: Intelligent Documentation Generator

Name: DocuGen: Intelligent Documentation Generator
Author: RunnerQuan

Automate creation of step-by-step documentation from web or desktop workflows. Use when: (1) "document this workflow", (2) "create a walkthrough for [URL]", (3) "generate documentation", (4) "record this process", (5) "make a guide for", (6) "document this desktop app", (7) "capture macOS/Windows workflow". Produces professional markdown with annotated screenshots, contextual explanations, prerequisites, and expected results. Supports both browser-based (Playwright/Chrome DevTools MCP) and native desktop application recording (accessibility APIs + vision).

RunnerQuan0 星标2026年3月30日

职业
分类: 技术文档

技能内容

DocuGen transforms web-based and desktop workflows into professional-quality documentation with annotated screenshots and contextual explanations.

Quick Start

Web Workflow

Provide the starting URL
Describe the workflow goal (e.g., "create a new project")
DocuGen will guide you through the recording process

Example:

Document this workflow: Create a new repository on GitHub
Starting URL: https://github.com/new

Desktop Workflow

Name the application and workflow goal
DocuGen detects desktop mode automatically from your description
Follow prompts to perform each action while screenshots are captured

Example:

Document this desktop workflow: Change display resolution in System Settings
Application: System Settings

相关技能

DocuGen: Intelligent Documentation Generator | Skills Pool

AskUserQuestion:
  question: "Is this a web-based or desktop application workflow?"
  header: "Mode"
  options:
    - label: "Web (browser)"
      description: "Recording in a web browser using Playwright/Chrome DevTools"
    - label: "Desktop (native app)"
      description: "Recording a native desktop application with screenshot capture"

User: "I'm going to click the Save button"

1. StepDetector.capture_before()
   → Takes screenshot, stores as baseline

2. [User performs action on desktop]

3. User: "Done" (or press Enter)

4. StepDetector.capture_after("Click Save button")
   → Takes screenshot
   → Compares SSIM (e.g., 0.74 < 0.87 threshold)
   → Returns StepRecord with before/after paths

5. get_element_metadata(click_x, click_y, after_screenshot_path)
   → Tries accessibility: {name: "Save", type: "button", source: "accessibility"}
   → Or vision fallback: {name: "Save", type: "button", source: "visual", confidence: 0.9}

6. Record step with full metadata

AskUserQuestion:
  question: "I detected 3 potentially sensitive fields to blur. Review?"
  header: "Redaction"
  options:
    - label: "Show me what you found"
      description: "Review auto-detected fields before blurring"
    - label: "Blur all detected fields"
      description: "Trust auto-detection and blur everything"
    - label: "No redaction needed"
      description: "Skip all blurring for this workflow"

AskUserQuestion:
  question: "Field: 'Password' input at coordinates (120, 340). Blur this?"
  header: "Blur field?"
  options:
    - label: "Yes, blur it"
      description: "Add blur to hide this content"
    - label: "No, keep visible"
      description: "This content is safe to show"
    - label: "Blur all remaining"
      description: "Skip review, blur everything else"

AskUserQuestion:
  question: "Add any custom regions to blur?"
  header: "Custom blur"
  options:
    - label: "Yes, let me specify"
      description: "I'll describe regions that need blurring"
    - label: "No, we're done"
      description: "Proceed with current selections"

{
  "redactionReview": {
    "autoDetected": [
      {"field": "password", "coords": [120, 340, 200, 30], "approved": true},
      {"field": "email", "coords": [120, 400, 200, 30], "approved": false}
    ],
    "customRegions": [
      {"description": "company logo", "coords": [10, 10, 100, 50]}
    ],
    "userChoice": "reviewed"
  }
}

{
  "step": 1,
  "action": "click",
  "elementText": "New Project",
  "context": "Creates a new project workspace to organize your team's work",
  "description": "Click **New Project** in the top navigation bar to start creating your project.",
  "expected": "The New Project dialog appears with fields for project name and description.",
  "prerequisites_detected": ["Logged into account", "On dashboard page"]
}

Pattern	Detected Prerequisite
Login page in first steps	"Active user account"
Navigation from dashboard	"Logged into the application"
Specific URL patterns	"Access to [feature name]"
Role-specific UI elements	"Appropriate permissions"
Form pre-filled data	"Required information prepared"

Element Metadata	Generated Context
`button#submit-form, "Submit"`	"Submits the completed form for processing"
`a[href="/settings"], "Settings"`	"Opens application settings to customize your experience"
`input[type="search"], placeholder="Search..."`	"Filters results to find specific items quickly"
`button.delete, "Delete"`	"Permanently removes the selected item"
`nav > a, "Dashboard"`	"Returns to the main overview of your workspace"

# [Workflow Title]

## Overview
[Brief description of what this guide covers]

## Prerequisites
- [Auto-detected requirements]
- [User-provided prerequisites]

## Steps

### Step 1: [Action Title]
[Contextual description explaining WHY this step matters]

![Step 1 Screenshot](./images/step-01-description.png)

**Expected result:** [What user should see after this step]

[... additional steps ...]

## Troubleshooting

**[Common Issue]**
[Description and resolution]

### Step 3: Save your project

Before you can share your project with others, you need to save it first.
This ensures all your changes are stored and creates a link you can share.

> **Note:** Saving cannot be undone. Make sure you're happy with your
> project name before proceeding.

Click the **Save** button (the blue button with a disk icon) in the top-right
corner of the screen.

**Expected result:** A green "Saved successfully!" message appears briefly,
and the URL in your browser's address bar changes to include your project ID.

### Step 3: Save your project

Click **Save** in the top-right corner to store your changes.

> **Warning:** This action cannot be undone.

**Expected result:** "Saved successfully!" confirmation appears.

### Step 3: Save your project

Click **Save**. Confirmation message confirms success.

Aspect	Beginner	Intermediate	Expert
Step length	50-100 words	25-50 words	10-25 words
Context	Full explanation	Brief purpose	Omit
Warnings	All applicable	Critical only	Rare
Screenshots	Every step	Key steps	Minimal
Navigation hints	Detailed	Brief	Omit
Terminology	Defined	Assumed known	Assumed known

Workflow Pattern	Troubleshooting Focus
Login/authentication	Session expired, invalid credentials, 2FA issues
Form submission	Validation errors, required fields, format issues
File upload	Size limits, format restrictions, upload failures
Search/filter	No results, too many results, filter confusion
Settings/configuration	Permission denied, changes not saving
Data creation	Naming conflicts, duplicate entries
Navigation	Page not found, access denied

{
  "detected_patterns": [
    {"pattern": "input[type='password']", "issue": "authentication"},
    {"pattern": "input[type='file']", "issue": "file_upload"},
    {"pattern": "form[action]", "issue": "form_submission"},
    {"pattern": ".search-input", "issue": "search_results"},
    {"pattern": "button:disabled", "issue": "permissions"}
  ]
}

## Troubleshooting

**Form submission fails without error message**
1. Check all required fields are completed (marked with *)
2. Verify email and phone fields are in correct format
3. Scroll up to check for error messages at the top of the form

**"Invalid input" error on project name**
Project names can only contain letters, numbers, and hyphens. Remove any
special characters or spaces and try again.

**Save button is disabled**
This indicates you may not have edit permissions for this project. Contact
your organization administrator to request Editor access.

# Launch browser with appropriate viewport
browser_launch: { headless: false, viewport: { width: 1280, height: 720 } }

# Navigate to starting URL
browser_navigate: { url: "https://example.com" }

# Take screenshot at CSS resolution (ALWAYS use scale: "css"!)
browser_screenshot: { path: "step-01.png", scale: "css", fullPage: false }

// Get element metadata
{
  selector: "button#submit",           // CSS selector used
  text: "Submit",                       // Visible text content
  ariaLabel: "Submit form",            // ARIA label if present
  role: "button",                       // ARIA role
  boundingBox: { x, y, width, height } // Position for annotation
}

// MutationObserver events to track:
- childList: Elements added/removed
- attributes: Class changes, visibility changes
- characterData: Text content changes

{
  "sessionId": "uuid",
  "mode": "web",
  "startUrl": "https://example.com",
  "workflowDescription": "Create a new project",
  "startTime": "2026-01-20T10:00:00Z",
  "steps": [
    {
      "step": 1,
      "action": "click",
      "selector": "button#new-project",
      "elementText": "New Project",
      "ariaLabel": "Create new project",
      "boundingBox": { "x": 100, "y": 200, "width": 120, "height": 40 },
      "screenshotBefore": "step-01-before.png",
      "screenshotAfter": "step-01-after.png",
      "ssimScore": 0.72,
      "timestamp": "2026-01-20T10:00:05Z",
      "domChanges": ["modal opened", "form displayed"]
    }
  ],
  "endTime": "2026-01-20T10:05:00Z"
}

{
  "sessionId": "uuid",
  "mode": "desktop",
  "app_name": "System Settings",
  "workflowDescription": "Change display resolution",
  "platform": {
    "os": "macos",
    "dpi_scale": 2.0,
    "has_accessibility": true,
    "has_window_enumeration": true
  },
  "startTime": "2026-01-23T14:00:00Z",
  "steps": [
    {
      "step": 1,
      "action": "click",
      "description": "Click Displays in the sidebar",
      "mode": "desktop",
      "app_name": "System Settings",
      "window_title": "System Settings",
      "element": {
        "name": "Displays",
        "type": "button",
        "bounds": { "x": 85, "y": 320, "width": 180, "height": 32 },
        "source": "accessibility"
      },
      "screenshotBefore": "step-01-before.png",
      "screenshotAfter": "step-01-after.png",
      "ssimScore": 0.68,
      "timestamp": "2026-01-23T14:00:12Z"
    },
    {
      "step": 2,
      "action": "click",
      "description": "Select Scaled resolution option",
      "mode": "desktop",
      "app_name": "System Settings",
      "window_title": "Displays",
      "element": {
        "name": "Scaled",
        "type": "radio",
        "bounds": { "x": 420, "y": 285, "width": 120, "height": 24 },
        "source": "visual",
        "confidence": 0.88
      },
      "screenshotBefore": "step-02-before.png",
      "screenshotAfter": "step-02-after.png",
      "ssimScore": 0.75,
      "timestamp": "2026-01-23T14:00:30Z"
    }
  ],
  "endTime": "2026-01-23T14:02:00Z"
}

// CORRECT: Atomic capture sequence
await page.waitForLoadState('networkidle');  // Wait for page to stabilize
const bbox = await element.boundingBox();     // Get coordinates NOW
await page.screenshot({ path: 'step.png', scale: 'css' });  // Screenshot NOW
// bbox coordinates match the screenshot exactly

// WRONG: Non-atomic capture (causes misalignment)
const bbox = await element.boundingBox();     // Get coordinates
await someOtherAction();                       // Page might change!
await page.screenshot({ path: 'step.png' });  // Screenshot - bbox is now stale

browser_screenshot: { path: "step-01.png", scale: "css" }

await page.setViewportSize({ width: 1280, height: 720 });

{
  "viewport": { "width": 1280, "height": 720 },
  "elements": [
    {
      "isTarget": true,
      "boundingBox": { "x": 100, "y": 200, "width": 120, "height": 40 }
    }
  ]
}

User: "Document creating a new GitHub repository"

1. browser_navigate: https://github.com/new
2. Wait for networkidle
3. [User says: "Enter repository name 'my-project'"]
4. ATOMIC: Capture metadata for input#repository-name (boundingBox, text, ARIA)
5. ATOMIC: browser_screenshot: { path: "step-01-before.png", scale: "css" }
   ↑ Steps 4-5 must happen with NO page changes between them!
6. browser_type: { selector: "input#repository-name", text: "my-project" }
7. Wait for networkidle
8. browser_screenshot: { path: "step-01-after.png", scale: "css" }
9. Run detect_step.py step-01-before.png step-01-after.png
10. If significant change: record as Step 1
11. Continue with next action...

User: "Document changing display resolution in System Settings"

1. Initialize StepDetector(mode="desktop", output_dir="./output/images")
2. Get platform capabilities: accessibility=True, os=macos

3. [Ask user: "What action will you perform next?"]
4. User: "I'll click Displays in the sidebar"

5. detector.capture_before()
   → Takes baseline screenshot

6. [User clicks Displays in System Settings]
7. User: "Done"

8. detector.capture_after("Click Displays in sidebar")
   → Takes screenshot, SSIM = 0.68 (< 0.87 threshold)
   → Step detected! Saves before/after images

9. get_element_metadata(x=85, y=320, screenshot_path="step-01-after.png")
   → Accessibility: {name: "Displays", type: "button", source: "accessibility"}

10. Record Step 1 with element metadata
11. [Ask user: "What action will you perform next?"]
12. Continue or finish...

from docugen.desktop import get_capture_capabilities, StepDetector, DetectorConfig

caps = get_capture_capabilities()
# {"screenshots": True, "window_enumeration": True, "accessibility": True,
#  "os": "macos", "dpi_scale": 2.0, "notes": []}

config = DetectorConfig(mode="desktop")  # Uses 0.87 threshold
detector = StepDetector(config=config, output_dir="./output/images")

from docugen.desktop import get_element_metadata

# Tries accessibility first, falls back to Claude Vision
element = get_element_metadata(x=420, y=285, screenshot_path="step-02-after.png")
# Returns: {"name": "Scaled", "type": "radio", "source": "accessibility"}
# Or:      {"name": "Scaled", "type": "radio", "source": "visual", "confidence": 0.88}

Source	Color	Border Width	Meaning
accessibility	Red-orange (255,87,51)	3px	High-confidence, API-verified
visual (≥0.8)	Orange (255,165,0)	3px	Vision-identified, confident
visual (<0.8)	Orange (255,165,0)	2px	Vision-identified, uncertain

### Step 2: Select Scaled resolution option

**Application:** System Settings - Displays

Click **Scaled** (radio, identified via visual analysis, 88% confidence)

![Step 2](./images/step-02-after.png)

**Expected result:** Resolution options grid appears below the Scaled radio button.

AskUserQuestion:
  question: "What action will you perform next on the desktop?"
  header: "Next action"
  options:
    - label: "Click an element"
      description: "I'll click a button, menu item, or other UI element"
    - label: "Type text"
      description: "I'll type into a text field or search box"
    - label: "Keyboard shortcut"
      description: "I'll use a keyboard shortcut (e.g., Cmd+S)"
    - label: "Done recording"
      description: "I've completed all the steps"

AskUserQuestion:
  question: "Where did you click? Describe the element or approximate screen position."
  header: "Element"
  options:
    - label: "I'll describe it"
      description: "Let me tell you what I clicked on"
    - label: "Auto-detect"
      description: "Use accessibility/vision to identify the element"

Script	Purpose
`detect_step.py`	SSIM-based step boundary detection
`annotate_screenshot.py`	Add highlights, arrows, callouts (web + desktop)
`generate_markdown.py`	Template-based markdown assembly (web + desktop)
`process_images.py`	Optimization and compression

Module	Purpose
`desktop/capture.py`	Cross-platform screenshot capture (mss)
`desktop/step_detector.py`	SSIM-based step detection with debounce
`desktop/platform_router.py`	Accessibility backend routing + visual fallback
`desktop/visual_analyzer.py`	Claude Vision API for element identification
`desktop/platform_utils.py`	OS detection and capability reporting

Template	Use Case
`walkthrough.md`	Default step-by-step guide
`quick_reference.md`	Condensed expert guide
`tutorial.md`	Learning-focused with exercises

DocuGen: Intelligent Documentation Generator

Quick Start

Web Workflow

Desktop Workflow

DocuGen: Intelligent Documentation Generator

Quick Start

Web Workflow

Desktop Workflow

Capabilities

Requirements

MCP Integrations (Web Mode)

Python Dependencies (Both Modes)

Platform Dependencies (Desktop Mode, Optional)

Mode Detection

Desktop Mode Keywords

Web Mode Keywords

Detection Priority

Workflow Orchestration

Phase 1: Initiation

Web Mode

Desktop Mode

Phase 2: Recording

Web Mode Recording

Desktop Mode Recording

Desktop Step Capture Flow

Phase 3: Processing

Interactive Redaction (User Prompts)

Auto-Detection First

Prompt User for Confirmation

If User Selects Review

Custom Redaction Regions

Redaction Session Data

Phase 4: Generation

Phase 5: Output

Progressive Disclosure

Contextual Description Generation (FR-3.2, US-2)

Semantic Analysis Process

Description Generation Rules

Output Format

Prerequisites Auto-Detection

Context Inference Examples

Quality Guidelines

Output Structure

Configuration

Audience Levels

Output Options

Redaction Options

Audience Adaptation (US-4)

Beginner Level (audience=beginner)

Intermediate Level (audience=intermediate)

Expert Level (audience=expert)

Adaptation Matrix

Troubleshooting Generation (FR-3.7)

Workflow Type Detection

Auto-Detection Rules

Troubleshooting Output Format

Issue Priority

Playwright MCP Integration

Browser Launch and Navigation

Element Interaction and Metadata Capture

DOM Event Tracking (FR-1.5)

Session Data Structure

Web Mode Session

Desktop Mode Session

Screenshot Capture Guidelines

Atomic Coordinate Capture (CRITICAL)

Always Use CSS Scale

Viewport Consistency

Action Recording

Example Recording Flow (Web)

Example Recording Flow (Desktop)

Desktop Capture Integration

Platform Initialization

Element Metadata Resolution

Source-Aware Annotations

Desktop Markdown Generation

Desktop Recording User Prompts

Scripts Reference

Desktop-Specific Modules

Templates

Beginner Level (`audience=beginner`)

Intermediate Level (`audience=intermediate`)

Expert Level (`audience=expert`)