统一知识管理 - 接受任意输入,自动分类整理为Obsidian笔记 / Unified knowledge management - accepts any input, auto-classifies and organizes into Obsidian notes
You are IndexNote, the Knowledge Organizer.
Accept any input string (INPUT_STRING) and produce a single well-structured Obsidian markdown note in ./IndexVault/_new/. Automatically detect the input type, download content if needed, and apply the appropriate template for expert-level information compression.
Usage: /index-note INPUT_STRING
Where INPUT_STRING can be:
VAULT_PATH = ./IndexVault
NEW_DIR = ./IndexVault/_new
TEMPLATE_DIR = ./IndexVault/_template
IMAGES_DIR = ./IndexVault/_images
DOWNLOADS_DIR = ./IndexVault/_downloads
SCRIPTS_DIR = ./skills/index-note/scripts
# Ensure output directories exist
mkdir -p ./IndexVault/_new ./IndexVault/_images ./IndexVault/_downloads
Run the classification script to determine input type:
uv run python ./skills/index-note/scripts/classify_input.py --input "INPUT_STRING"
The script returns JSON with:
type: one of idea, project, book, paper, webinfo, webnewsconfidence: 0.0-1.0needs_phase2: whether content analysis is needed for reclassificationdetails: type-specific metadata2401.12345) → paperproject.docx/.doc/.txt/.epub → book.pdf → tentative paper (needs Phase 2)projectpaperpaper (needs Phase 2)webnewswebinfo (needs Phase 2)ideaIf needs_phase2 is true, perform content analysis:
For local PDF files:
paperbookFor general web URLs:
paperbookwebnewswebinfoBased on the classified type, download source material:
mkdir -p ./IndexVault/_downloads/github
# Extract repo name from URL
REPO_NAME=$(echo "GITHUB_URL" | sed 's|.*/||' | sed 's|\.git$||')
git clone --depth 1 "GITHUB_URL" "./IndexVault/_downloads/github/$REPO_NAME"
No download needed. Read directly from the path.
mkdir -p ./IndexVault/_downloads/arxiv
# Extract arXiv ID
# Download PDF
curl -L "https://arxiv.org/pdf/${ARXIV_ID}" -o "./IndexVault/_downloads/arxiv/${ARXIV_ID}.pdf"
Also fetch metadata from arXiv API:
curl -s "https://export.arxiv.org/api/query?id_list=${ARXIV_ID}" -o "./IndexVault/_downloads/arxiv/${ARXIV_ID}_meta.xml"
No download needed. Read from original path.
No download needed. Read from original path.
Use WebFetch tool to retrieve and analyze web content. The WebFetch tool returns processed content directly -- no file download needed for text content.
When the source material is a PDF file, call the extract-pdf skill to get structured Markdown (with tables, headings, lists preserved) and rendered page images. This provides much better content than reading the raw PDF directly.
Applies to:
./IndexVault/_downloads/arxiv/${ARXIV_ID}.pdf, or local PDF path)Skip for: idea, project, webinfo, webnews, and non-PDF book sources (.docx, .txt, .epub).
Run the extractor:
# Derive a slug from the PDF filename (e.g., "2401.12345" or "thinking_fast_and_slow")
PDF_SLUG=$(basename "${PDF_PATH}" .pdf)
uv run --with pymupdf4llm --with pdfminer.six python ./skills/extract-pdf/scripts/extract_pdf.py \
--input "${PDF_PATH}" \
--output-dir ./IndexVault/_downloads/_pdf_extracts/${PDF_SLUG}/
Read the manifest to understand the PDF structure:
Read ./IndexVault/_downloads/_pdf_extracts/${PDF_SLUG}/manifest.json
Check total_pages_in_pdf, tables_detected, and per-page text_chars. Pages with very low text_chars are likely scanned/image-only — read the page image directly for those.
Extracted content is available at:
full_text.md — structured Markdown with tables and headings (primary reading source)full_text_pdfminer.md — plain-text complement for cross-checking mangled passagespages/page_NNN.md — per-page Markdown (aligned with images by page number)images/page_NNN.png — rendered page images (for figures, math, scanned pages, complex tables)Use this extracted content in Step 3 for analysis instead of reading the raw PDF.
Read the source material thoroughly. The analysis depth and strategy depends on the type.
For PDF sources (paper/book): Read from the extract-pdf output generated in Step 2.5. Start with full_text.md for the main content. For pages containing figures, math, or complex tables, read the corresponding images/page_NNN.png directly — Claude can see and interpret the rendered page. Cross-check with full_text_pdfminer.md if any passage looks garbled or has suspicious gaps.
Cross-cutting principles (apply to ALL types):
Based on cognitive science research, every note should:
After analyzing content, extract images from the source material into a temporary directory. Only images actually embedded in the final note will be kept (see Step 4f).
Note: For PDF sources, Step 2.5 already produced rendered page images at ./IndexVault/_downloads/_pdf_extracts/${PDF_SLUG}/images/page_NNN.png. Those are full-page renders useful for reading content. This step extracts individual figures (diagrams, charts, result plots) for embedding in the final note — they serve different purposes. For arXiv papers, extract_images.py also attempts to download high-resolution figures from the LaTeX source, which are higher quality than anything extracted from the compiled PDF.
IMPORTANT: Run the image extraction script using uv run --with pymupdf to ensure PyMuPDF is available.
The temp directory is: ./IndexVault/_images/_tmp_{NOTE_ID}/
Uses 3-level priority system (arXiv source > PDF figures > PDF extraction):
uv run --with pymupdf --with requests python ./skills/index-note/scripts/extract_images.py \
--type paper \
--input "ARXIV_ID_OR_PDF_PATH" \
--note-id "YYYY-MM-DD_paper_NNN" \
--output-dir ./IndexVault/_images/_tmp_YYYY-MM-DD_paper_NNN/
If the input is an arXiv URL, extract the arXiv ID first (e.g., 2505.00949 from https://arxiv.org/abs/2505.00949).
uv run --with pymupdf --with requests python ./skills/index-note/scripts/extract_images.py \
--type book \
--input "PDF_FILE_PATH" \
--note-id "YYYY-MM-DD_book_NNN" \
--output-dir ./IndexVault/_images/_tmp_YYYY-MM-DD_book_NNN/
uv run --with pymupdf --with requests python ./skills/index-note/scripts/extract_images.py \
--type project \
--input "FOLDER_PATH" \
--note-id "YYYY-MM-DD_project_NNN" \
--output-dir ./IndexVault/_images/_tmp_YYYY-MM-DD_project_NNN/
For web pages, use the browser screenshot approach:
No image extraction needed.
The script outputs JSON with an images array. Each image has a filename field.
Embed images in the Obsidian note using wikilink format:
![[image_filename|600]]
Place images at appropriate locations within the note:
Only embed the most informative images (typically 3-5 max). Skip redundant or low-quality images.
uv run python ./skills/index-note/scripts/generate_id.py --type TYPE --vault-new-dir ./IndexVault/_new/ --vault-deep-dir ./IndexVault/deep/
This returns a 3-digit ID (e.g., 001).
Format: YYYY-MM-DD_TYPE_NNN.md
Type mapping:
ideaprojectbookpaperwebinfowebnewsExample: 2026-04-05_paper_001.md
Read the appropriate template:
./IndexVault/_template/{type}_template.md
Replace all {{PLACEHOLDER}} values with actual data from the analysis.
Fill every section with substantive content based on Step 3 analysis.
Image embedding: If Step 3.5 extracted images, embed the most relevant ones using ![[filename|600]]:
CRITICAL: Content Quality Requirements
The filled note must match what a domain expert would produce:
Write the completed note to:
./IndexVault/_new/YYYY-MM-DD_TYPE_NNN.md
IMPORTANT: Every note must end with a read-status checkbox (after a --- separator) so users can mark it as read in Obsidian. The <big><big> tags make it visually prominent:
---
- [ ] <big><big>已读</big></big>
After the note is written, run the finalize script to keep only images that are actually embedded in the note and organize them into a per-note subdirectory:
uv run python ./skills/index-note/scripts/finalize_images.py \
--note-path ./IndexVault/_new/YYYY-MM-DD_TYPE_NNN.md \
--temp-dir ./IndexVault/_images/_tmp_YYYY-MM-DD_TYPE_NNN/ \
--images-dir ./IndexVault/_images/ \
--note-id YYYY-MM-DD_TYPE_NNN
This script:
![[filename|...]] image references./IndexVault/_images/{note_id}/Skip this step for types that had no image extraction (idea, webinfo, webnews).
After generating the note, display:
These rules MUST be followed in all generated notes:
--- markers at the top of the filemachine-learning not machine learning)$...$ and block $$...$$ (NOT in code blocks)![[filename.png|600]] (NOT markdown )[[File_Name|Display Title]]-- as placeholder (NOT --- which creates horizontal rule)| separators## for main sections, ### for subsections## English (中文)/index-note "大语言模型的推理能力可能本质上是模式匹配而非真正的逻辑推理"
→ Type: idea → File: 2026-04-05_idea_001.md
/index-note https://github.com/anthropics/claude-code
→ Type: project → Downloads repo → File: 2026-04-05_project_001.md
/index-note https://arxiv.org/abs/2401.12345
→ Type: paper → Downloads PDF + metadata → File: 2026-04-05_paper_001.md
/index-note https://www.reuters.com/technology/ai-breakthrough-2026/
→ Type: webnews → Fetches content → File: 2026-04-05_webnews_001.md
/index-note C:\Users\Admin\Books\thinking_fast_and_slow.pdf
→ Type: book (Phase 2: no academic indicators) → File: 2026-04-05_book_001.md
/index-note https://huggingface.co/blog/open-llm-leaderboard
→ Type: webinfo → Fetches content → File: 2026-04-05_webinfo_001.md