Extract URL content and create literature notes and review notes from web articles
Extract content from URLs and create properly linked vault notes and review notes.
# Single URL extraction
uv run python .claude/skills/obsidian-read/scripts/extract_url.py <url> --json
# Parse URLs from raw inbox
uv run python .claude/skills/obsidian-read/scripts/extract_url.py --from-file "staging/To Read Later.md" --json
# Verbose output for debugging
uv run python .claude/skills/obsidian-read/scripts/extract_url.py <url> --verbose
Fallback chain: Jina Reader (primary, ~0.3s) → Wayback Machine (archived snapshots).
Optional local alternative: defuddle-cli (by kepano, Obsidian creator) — npm install -g defuddle-cli && defuddle <url>. No API calls, good for paywalled content you have access to locally.
Extract content via extract_url.py <url> --json
Present summary: title, word count, extraction method, content size category, content preview
Check staging/ for existing notes containing the URL or matching keywords from the title — offer to use existing content instead of creating from scratch
Ask user:
Create full text note — always created by default:
---
type: literature
title: "Article Title"
source: "Author, Title (Year)"
url: "https://..."
created: YYYY-MM-DD
up: "[[Article Title Review Note]]"
tags:
- literature
- full-text
- domain-tag
---
Named Article Title Full Text.md. Contains the verbatim extracted markdown. Points up: to the review note (NOT the MOC) to avoid cluttering MOC listings. Tagged full-text for easy dataview filtering.
Create review note at vault root:
---
type: review
title: "Article Title"
source: "Author, Title (Year)"
url: "https://..."
full-text: "[[Article Title Full Text]]"
created: YYYY-MM-DD
up: "[[Domain MOC]]"
related:
- "[[Related Note]]"
tags:
- review
- domain-tag
---
The full-text: property links to the literature note with the complete article text.
Review notes include sections: Summary, Key Takeaways, Personal Reflection
Important:
up:points to the domain MOC (e.g., Machine Learning, Generative AI Resources), NOT a generic "Reviews MOC" or "Sources MOC". Seeobsidian-reviewskill. Important: Only the review note gets added to the MOC. The full text note stays out of MOC listings — it's discoverable via thefull-text:property and thefull-texttag.
extract_url.py --from-file "staging/To Read Later.md" --jsonstaging/To Read Later.mdThe reading pipeline uses two files:
| File | Role |
|---|---|
staging/To Read Later.md | Raw URL inbox. Clip URLs here from phone, browser, etc. This is the processing queue. |
To Read Later.md | Curated reading list. Checkbox entries with 1-line context + link to extracted vault note. Unread / Read sections. |
Flow: URL clipped → staging/To Read Later.md → /obsidian-read processes → vault note created → entry added to To Read Later.md (Unread) → user reads → marks [x] → moves to Read section
When processing a URL:
staging/ for notes containing the URL or keywords from the extracted titleThe extraction script automatically:
open.substack.com/pub/ redirects to clean URLsobsidian-review — note type selection (review vs literature), frontmatter templatesobsidian-organize — frontmatter validation (validate_frontmatter.py)| Error | Suggestion |
|---|---|
| Jina Reader HTTP 402 | Rate limited — wait 30s and retry |
| Jina Reader HTTP 403 | Site blocks Jina — Wayback fallback will be tried automatically |
| No Wayback snapshot | Article too new or not indexed — try defuddle-cli locally, or paste content manually |
| Timeout | Increase with --timeout 60 |
| Empty content | Site may use heavy JS rendering — try defuddle-cli or paste manually |
Update parent MOC — add link to review note in the appropriate section (not the full text note)
Update reading list — if URL was sourced from staging/To Read Later.md:
staging/To Read Later.mdTo Read Later.md under ## Unread: - [ ] [[Note Title]] — 1-line description[x], move it to the ## Read section