This skill converts WeChat Official Account (微信公众号) article pages into high-quality, clean Markdown format. It should be used when the user provides a WeChat article URL (mp.weixin.qq.com) and wants to convert, extract, save, or archive the article content as Markdown. Trigger phrases include "convert WeChat article", "微信文章转Markdown", "save this WeChat article", "extract article content", "抓取微信文章", "文章转MD", or when a mp.weixin.qq.com URL is provided.
修复:
fetch_with_playwright 改用移动端 Chromium(is_mobile=True + iPhone UA + 393×852 viewport),临时分享链接(tempkey)可正常渲染data-src 图片加载对比(v1.0 → v1.1):
| 项目 | 旧版 | 新版 |
|---|---|---|
| User Agent | 桌面 Chrome | iPhone Safari |
| Viewport | 1280×900 | 393×852 |
| 临时链接 | ❌ 无法渲染 | ✅ 正常 |
| 懒加载图片 | ❌ | ✅ 滚动触发 |
Convert WeChat Official Account articles () into clean, high-quality Markdown. The skill uses a Python script optimized for WeChat's unique DOM structure, featuring deep noise removal, smart code block detection, rich text preservation, and intelligent paragraph formatting.
mp.weixin.qq.comUser provides WeChat article URL?
├── Yes → Go to Step 1: Install Dependencies & Run Script
├── User wants to convert HTML directly?
│ └── Use Step 2: In-Line Conversion (for fetched HTML)
└── User asks about multiple URLs?
└── Use batch mode with -f flag
Ensure Python dependencies are available. Install if missing:
pip install requests beautifulsoup4 markdownify
Run the conversion script:
python scripts/wechat_to_md.py "<WECHAT_URL>" -o "<OUTPUT_DIR>"
Options:
--no-images — Skip image downloading, keep remote URLs--no-frontmatter — Omit YAML frontmatterpython scripts/wechat_to_md.py url1 url2 url3The output structure:
<OUTPUT_DIR>/
└── <Article_Title>/
├── <Article_Title>.md
└── images/
├── img_000.png
└── img_001.jpg
If the HTML has already been fetched (e.g., via web_fetch), use the script's convert_simple() function programmatically:
import sys
sys.path.insert(0, "<SKILL_DIR>/scripts")
from wechat_to_md import convert_simple
# 基础用法:仅转换,不下载图片
result = convert_simple("https://mp.weixin.qq.com/s/xxxxx")
markdown = result["markdown"] # Full Markdown string
metadata = result["metadata"] # {title, author, date, url, ...}
code_blocks = result["code_blocks"] # [{lang, code}, ...]
image_urls = result["image_urls"] # 原始图片 URL 列表
# 高级用法:同时下载图片到本地
result = convert_simple(
"https://mp.weixin.qq.com/s/xxxxx",
download_imgs=True, # 启用图片下载
output_dir="./my_article" # 指定输出目录(可选)
)
markdown = result["markdown"] # 图片链接已替换为本地路径
image_mapping = result["image_mapping"] # URL -> 本地路径映射
output_dir = result["output_dir"] # 实际输出目录
Return the Markdown content directly to the user or write it to a file.
.md file and present a summary.The script removes 30+ WeChat-specific noise elements including:
.mp_profile_iframe, #ad_content).reward_area, .qr_code_pc)#comment_container, #js_cmt_area)mpvoice, mpvideo)#relation_article)display:none, visibility:hidden)<span> placeholdersHandles all 3 WeChat code block formats:
pre.code-snippet with data-lang attribute.code-snippet__fix container with nested pre[data-lang]pre[data-lang]Features:
data-lang, CSS class, and code content.code-snippet__line-index)counter(line) garbage text)<b> → <strong>, <i> → <em>, handles inline font-weight: bold•, ·, 1., (1)) to proper Markdown listsdata-src → src) → space, zero-width spaces removed)Generates YAML frontmatter:
---