Name: Desktop
Author: Aster110

Desktop

桌面操控。让 CC 看屏幕、动鼠标、点按钮、输文字。基于 macOS 原生 OCR 触觉反馈，不截全屏传 AI，极致省 token。触发词："/desktop"、"帮我操作桌面"、"点一下那个按钮"、"看看屏幕上有什么"

Aster110166 starsMar 4, 2026

Occupation: Data Entry Keyers
Categories: System Administration

Desktop — CC 桌面操控

核心理念：触觉反馈

不截全屏传 AI（慢、贵），而是用 macOS 原生 OCR 当"触觉"。鼠标移到哪 → OCR 哪 → Agent 知道手在哪、摸到什么。

CLI 工具栈

所有操作通过 Bash 调 CLI 工具完成：

# ===== 眼（感知） =====

# OCR 工具路径（ocr.py 在本 skill 目录下，venv 在 ~/tools/ocr-env/）
SKILL_DIR=".claude/skills/desktop"
OCR="$HOME/tools/ocr-env/bin/python3 $SKILL_DIR/ocr.py"

# 全屏 OCR（快速模式，~240ms，注意：中文必须用精确模式）
$OCR --screen --fast

# 全屏 OCR + JSON 包围盒（精确模式，~870ms）
$OCR --screen --bbox

# 鼠标附近 OCR（触觉，~250ms，默认 300x200）
$OCR --cursor

# 鼠标附近 OCR + JSON（带可点击坐标）
$OCR --cursor --bbox

# 鼠标附近 OCR（自定义区域大小）
$OCR --cursor --size 200x100

# 指定图片 OCR
$OCR /path/to/image.png

# ===== 手（操控） =====

# 获取鼠标位置
cliclick p:.

# 移动鼠标
cliclick m:500,300

# 点击
cliclick c:500,300

# 双击
cliclick dc:500,300

# 右键
cliclick rc:500,300

# 输入文字
cliclick t:"Hello World"

# 按键（回车、Tab 等）
# 注意：cliclick kp:return 在微信等部分应用中无效，优先用 osascript
osascript -e 'tell application "System Events" to key code 36'   # Return（推荐）
cliclick kp:return                                                # Return（备选）
cliclick kp:tab
cliclick kp:escape

# 键盘快捷键
osascript -e 'tell application "System Events" to keystroke "c" using command down'
osascript -e 'tell application "System Events" to keystroke "v" using command down'
osascript -e 'tell application "System Events" to keystroke "s" using command down'
osascript -e 'tell application "System Events" to keystroke "a" using command down'

# ===== 感知（窗口/应用状态） =====

# 当前可见应用列表
osascript -e 'tell application "System Events" to get name of every process whose visible is true'

# 当前前台窗口信息
osascript -e 'tell application "System Events" to get {name, position, size} of every window of (first process whose frontmost is true)'

# 切换到指定应用
osascript -e 'tell application "APP_NAME" to activate'

# 点击菜单
osascript -e 'tell application "System Events" to click menu item "ITEM" of menu "MENU" of menu bar 1 of process "APP"'

# 全屏截图（传 Claude 视觉兜底时用）
screencapture -x /tmp/desktop-screenshot.png

# 局部截图
screencapture -x -R "x,y,w,h" /tmp/desktop-region.png

Desktop

Aster110166 starsMar 4, 2026

Occupation: Data Entry Keyers
Categories: System Administration

CLI 工具栈

所有操作通过 Bash 调 CLI 工具完成：

# ===== 眼（感知） ===== # OCR 工具路径（ocr.py 在本 skill 目录下，venv 在 ~/tools/ocr-env/） SKILL_DIR=".claude/skills/desktop" OCR="$HOME/tools/ocr-env/bin/python3 $SKILL_DIR/ocr.py" # 全屏 OCR（快速模式，~240ms，注意：中文必须用精确模式） $OCR --screen --fast # 全屏 OCR + JSON 包围盒（精确模式，~870ms） $OCR --screen --bbox # 鼠标附近 OCR（触觉，~250ms，默认 300x200） $OCR --cursor # 鼠标附近 OCR + JSON（带可点击坐标） $OCR --cursor --bbox # 鼠标附近 OCR（自定义区域大小） $OCR --cursor --size 200x100 # 指定图片 OCR $OCR /path/to/image.png # ===== 手（操控） ===== # 获取鼠标位置 cliclick p:. # 移动鼠标 cliclick m:500,300 # 点击 cliclick c:500,300 # 双击 cliclick dc:500,300 # 右键 cliclick rc:500,300 # 输入文字 cliclick t:"Hello World" # 按键（回车、Tab 等） # 注意：cliclick kp:return 在微信等部分应用中无效，优先用 osascript osascript -e 'tell application "System Events" to key code 36' # Return（推荐） cliclick kp:return # Return（备选） cliclick kp:tab cliclick kp:escape # 键盘快捷键 osascript -e 'tell application "System Events" to keystroke "c" using command down' osascript -e 'tell application "System Events" to keystroke "v" using command down' osascript -e 'tell application "System Events" to keystroke "s" using command down' osascript -e 'tell application "System Events" to keystroke "a" using command down' # ===== 感知（窗口/应用状态） ===== # 当前可见应用列表 osascript -e 'tell application "System Events" to get name of every process whose visible is true' # 当前前台窗口信息 osascript -e 'tell application "System Events" to get {name, position, size} of every window of (first process whose frontmost is true)' # 切换到指定应用 osascript -e 'tell application "APP_NAME" to activate' # 点击菜单 osascript -e 'tell application "System Events" to click menu item "ITEM" of menu "MENU" of menu bar 1 of process "APP"' # 全屏截图（传 Claude 视觉兜底时用） screencapture -x /tmp/desktop-screenshot.png # 局部截图 screencapture -x -R "x,y,w,h" /tmp/desktop-region.png

级别	操作	策略
绿色	截图、OCR、读剪贴板、列窗口、移动鼠标	自动执行
黄色	点击、输入文字、切换窗口	执行并告知用户
红色	涉及密码、支付、删除、发消息	先问用户

Desktop

Desktop — CC 桌面操控

核心理念：触觉反馈

CLI 工具栈

Desktop

Desktop — CC 桌面操控

核心理念：触觉反馈

CLI 工具栈

操作流程

标准流程：点击按钮

快捷流程：已知坐标直接点

文字输入

OCR 找不到时的兜底

安全规则

OCR 输出格式

简洁模式（默认）

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl