Academic knowledge base manager for Obsidian vaults — ingest papers and articles into structured notes, compile topic syntheses, query across the corpus, and lint for health. Use this skill whenever the user mentions /kb, wants to add a paper or article to their vault, asks to summarize or synthesize a research topic, wants to compile or update topic notes, asks cross-paper research questions, or mentions ingesting, compiling, linting, or organizing academic papers. Also trigger when the user says things like 'add this paper', 'add this article', 'update the topic', 'what do my papers say about X', or 'check my vault for broken links'.
Manage an Obsidian-based academic knowledge base with a full pipeline: ingest → compile → query → lint. This skill turns raw PDFs into structured paper notes, captures articles (blogs, tech reports, white papers, social media) as lightweight notes, synthesizes cross-source insights into topic files, answers research questions grounded in the corpus, and keeps the vault healthy.
The vault follows this layout — never deviate from it:
vault-root/
├── papers/
│ ├── [Paper Title]/ # one folder per paper, using the original full title
│ │ ├── [Paper Title].md # structured paper note
│ │ ├── [Paper Title].pdf # original PDF
│ │ ├── annotation.html # optional, from /kb read
│ │ └── sections/ # optional, auto-created for long/complex papers
│ │ ├── 01-Introduction.md # deep-read note per section
│ │ ├── 03-Methodology.md
│ │ └── ...
│ └── Paper Database.md # Dataview dashboard (auto-queries, do not edit manually)
├── articles/
│ ├── [Article Title]/ # one folder per article, using the original title
│ │ ├── [Article Title].md # structured article note (5-section)
│ │ └── snapshot.html # optional, web page snapshot for link rot prevention
│ └── Article Database.md # Dataview dashboard (create if missing)
├── topics/
│ ├── [Topic Name].md # topic synthesis file (aggregates both papers and articles)
│ └── Topics Index.md # central hub with mermaid map
├── queries/ # cross-corpus research Q&A (create if missing)
│ └── [YYYY-MM-DD] [Question Slug].md
└── kb-log.md # append-only audit trail (create at vault root if missing)
This skill orchestrates several companion skills — use them for sub-tasks:
| Command | Usage | Description |
|---|---|---|
ingest | /kb ingest <pdf-path-or-query> | Download (if needed) + parse PDF → structured paper note with full frontmatter, 12-section analysis, and optional deep-read section notes |
ingest article | /kb ingest article <url-or-text> | Capture a blog/report/social-media post → structured article note with 5-section analysis |
read | /kb read <pdf-path> | Annotate a PDF → full dual-column HTML with 5-color highlights |
compile | /kb compile <topic> or /kb compile --all | Synthesize all papers and articles under a topic → update topic file |
query | /kb query "<question>" | Answer a cross-corpus research question → save to queries/ |
lint | /kb lint | Health check: orphan papers/articles, dead wikilinks, stale topics, frontmatter gaps |
log | /kb log | Show recent operations from kb-log.md |
Flags (for read):
--questions "Q1:... Q2:..." — switch to Question mode (Mode A)--questions (no arg) — interactive: Claude asks you for questions--lang en|zh — override annotation language (default: zh)Flags (for ingest):
--deep — force section-level deep-read notes (creates sections/ subdirectory)--no-deep — skip section-level deep-read even for long papersInput: A PDF file path. Optional flags: --questions, --lang.
This produces a self-contained dual-column HTML annotation — original text on the left with color highlights, analytical annotations on the right. The output goes to papers/[Paper Title]/annotation.html and can later be consumed by ingest for richer paper notes.
--questions → Mode B (logic analysis, default)--questions "Q1: ... Q2: ..." → Mode A (question-driven, inline)--questions (no argument) → Mode A (interactive: ask user for up to 6 questions)Read the PDF at the provided path. Extract: title, authors, year, venue/journal, abstract, full text.
Determine the paper title and output path. Save to papers/[Paper Title]/annotation.html. If the papers folder doesn't exist yet, create it and copy the PDF there.
Generate the full dual-column HTML annotation. Embed references/template.css as a <style> block in <head>. Import Google Fonts: Lora + IBM Plex Sans.
1. TOP NAVBAR
- Paper title
- Context label (or "General Reading" if none)
- Color legend: 5 colored chips with dimension labels
2. STICKY SECTION NAV
- Links: Abstract | Introduction | Related Work | Methods | Results | Discussion | Conclusion
- Highlight active section on scroll
3. DUAL-COLUMN BODY — one <div class="paragraph-group"> per paragraph
Left column (.original-text) — original text (ALWAYS IN ORIGINAL LANGUAGE, never translate):
- Copy verbatim from PDF
- Highlight key phrases: <mark class="thesis">...</mark> etc.
Right column (.annotation-card) — annotation cards (language follows --lang, default zh):
[Colored left border matching paragraph's dominant highlight]
① 段落功能 / Paragraph Function
② 逻辑角色 / Logical Role
③ 论证技巧或潜在漏洞 / Rhetorical Technique or Logical Gap
4. BACK-TO-TOP BUTTON
<button id="back-to-top" title="返回顶部">↑</button>
<script>
const btn = document.getElementById('back-to-top');
window.addEventListener('scroll', () => {
btn.style.display = window.scrollY > 300 ? 'flex' : 'none';
});
btn.addEventListener('click', () => window.scrollTo({ top: 0, behavior: 'smooth' }));
</script>
5. BOTTOM — Argument Structure Overview (language follows --lang)
zh: 问题 / 论点 / 证据 / 反驳处理 / 结论
en: Problem / Argument / Evidence / Concession / Conclusion
- Author's core claim (1 sentence)
- 最强论证 / Strongest argument
- 最弱论证 / Weakest argument
- APA Citation + copy button
- BibTeX block + copy button
<script>
function copyText(btn, text) {
navigator.clipboard.writeText(text);
btn.textContent = '✓ 已复制'; btn.classList.add('copied');
setTimeout(() => {
btn.textContent = btn.dataset.label; btn.classList.remove('copied');
}, 1500);
}
</script>
| Color | CSS Class | Represents |
|---|---|---|
🟡 Yellow #fef08a | thesis | Core thesis / main claim |
🔴 Red #fecaca | concept | Key concepts / terminology |
🔵 Blue #bfdbfe | evidence | Empirical evidence / data |
🟢 Green #bbf7d0 | concession | Concessions / counterargument handling |
🟣 Purple #e9d5ff | methodology | Methodology description |
Same HTML structure as Mode B, with these differences:
<mark class="q1">, <mark class="q2">, etc.【Q2 核心论点】 / [Q2 Core Argument]:
① Paragraph Function ② Argument Logic ③ Which question this paragraph answersAll CSS is in references/template.css. When generating HTML output, embed the file contents as a <style> block in <head>.
Output: file path + brief summary of key arguments found. The HTML file can be opened directly in any browser.
Input: One of the following:
papers/AlphaEvolve-2025/AlphaEvolve-2025.pdf2502.13131 or https://arxiv.org/abs/2502.13131"AlphaEvolve: A coding agent for scientific discovery"10.xxxx/xxxxxOptionally, an HTML annotation from /kb read may already exist in the same folder.
Resolve the paper identity:
https://export.arxiv.org/api/query?search_query=ti:"<title>"&max_results=5https://api.openalex.org/works?search=<query>&select=title,doi,primary_location&per-page=5Download the PDF:
https://arxiv.org/pdf/<arxiv_id>.pdfpapers/[Paper Title]/[Paper Title].pdf (create the folder if needed)Continue to Step 1 with the downloaded PDF path.
Determine the paper title. Use the original full title of the paper as the folder and file name. Examples: Agentic Reasoning, Memory in the Age of AI Agents. Keep the exact title from the paper — do not abbreviate or reformat.
Create the paper folder at papers/[Paper Title]/. Copy or note the PDF path. If an HTML annotation already exists (from a prior /kb read), check for it at papers/[Paper Title]/annotation.html or in the same directory as the PDF.
Extract paper content. Read the PDF to extract: title, authors, affiliations, year, venue, abstract, full text, and references.
Fill the frontmatter. Use the exact schema from references/paper-frontmatter.md. Every field matters for Dataview queries. Key rules:
tags: always include paper as the first tag, then domain tagstopics: plain text array (NOT wikilinks) — e.g., [Agent Memory, Agent RL]. Only use topic names that exist in topics/, or propose a new one and flag it to the userstatus: set to unread initially (the user will change it after reading)relevance: make your best estimate (1-5) based on the user's research focus, but tell the user your reasoning so they can adjustsummary, research_question, contributions, findings: fill these from the paper contentWrite the 12-section paper note (sections 0-11). Follow the template in references/paper-note-template.md. The sections are:
Each section should be substantive (not just one sentence). Bilingual format is required: write each section with Chinese first, then English below (separated by a blank line), so both languages appear side-by-side within every section. Technical terms and proper nouns stay in English in both versions. If an HTML annotation is available, leverage its highlights and argument analysis to enrich sections 1-11.
5b. Deep-read assessment and section note generation. Evaluate whether this paper warrants section-level deep-read notes. This step produces individual notes per original paper section in papers/[Paper Title]/sections/.
Trigger logic (evaluated automatically unless overridden by --deep / --no-deep):
survey or the user's relevance estimate is ≥ 4--deep flag was explicitly provided--no-deep was provided or the paper is very short (≤ 6 pages)If deep-read is triggered:
a. Create the sections/ directory at papers/[Paper Title]/sections/.
b. Identify sections to deep-read. Map the paper's original section structure (e.g., "1 Introduction", "2 Related Work", "3 Method", "3.1 Architecture", "3.2 Training", "4 Experiments", "5 Discussion"). For papers with deep sub-section hierarchies, group at the top-level section (e.g., "3 Method" covers 3.1–3.N in one note) unless a sub-section is itself very long (≥ 2 pages), in which case give it its own note.
c. For each section, create a deep-read note following the template in references/section-note-template.md. The filename follows NN-Section-Name.md (e.g., 01-Introduction.md, 03-Methodology.md). Each note contains:
- 章节定位 callout: the section's role in the paper's argument chain, prerequisites, and transition to the next section
- 精读摘要: deep summary of the section's core argument, methodology details, or experimental findings (3-6 paragraphs, scaled to section length)
- 关键 Insight: 2-5 transferable insights extracted from this section (using [!tip] callouts)
- 原文关键段落: 2-4 verbatim quotes of the most valuable paragraphs (using [!quote] callouts with section/page references)
- 与其他工作的联系: connections to other papers/topics in the vault via wikilinks
d. Frontmatter for each section note:
yaml --- parent: "[[Paper Title]]" section_number: 3 section_title: "Methodology" page_range: "pp. 4-8" tags: [section-note] ---
e. Skip trivial sections. Do not create section notes for: Abstract (already in frontmatter), Acknowledgements, purely bibliographic Reference lists, or sections shorter than half a page. The Publication Info (section 0) in the main note already covers metadata.
f. After generating all section notes, inform the user: - How many section notes were created - Which sections were skipped and why - Highlight the 2-3 most insight-rich sections
Weave wikilinks throughout the note — this is critical for Obsidian's graph and backlink features. The paper note should contain at least 5-10 wikilinks spread across sections 1-10. Two types:
topics field should appear as [[Topic Name]] at least once in the body text where that topic is most relevant. For example, if topics: [Agent Memory, Agent RL], then section 2 or 6 might say "This paper's memory mechanism is closely related to recent advances in [[Agent Memory]]".[[Paper Title|display text]]. Check papers/ for existing paper folders. For example: "Compared to the Agentic Reasoning framework by [[Agentic Reasoning|Wei et al. 2026]]..."A paper note with zero topic/paper wikilinks is incomplete — the whole point of the knowledge base is cross-linking. Scan the vault's existing papers for related work and link generously.
Classify into topics. Based on the paper's content, suggest 2-5 topics from the existing topics/ directory. If a paper clearly belongs to a topic that doesn't exist yet, propose creating it (but don't create it automatically — ask the user first).
Update topic files. For each topic in the paper's topics field, open the corresponding topic file and add a wikilink to the new paper in its 相关论文 section. Prefix the entry with 🆕 to mark it as not-yet-compiled. Use the format: - 🆕 📄 [[Paper Title|Short Description]] — one-line summary of relevance. The 🆕 marker signals that this entry has been added since the last /kb compile for this topic and its insights have NOT yet been integrated into the topic's Insight Synthesis section. The marker is removed during compile (see Procedure 2, Step 5b).
Update Topics Index.md. If any paper count changed, update the count in the Topics table.
Append to kb-log.md:
## [YYYY-MM-DD] ingest | [Paper Title] ([venue], [year]) [+N section notes]
The [+N section notes] suffix is only added when deep-read was triggered. Omit it for papers without section notes.
Present a summary to the user:
/kb compile <topic> to integrate the new insights into the synthesis. Example: ⚠️ Auto Researcher 有 3 篇待整合,建议 /kb compile Auto ResearcherInput: One of the following:
https://karpathy.github.io/2025/01/llm-os/"Karpathy 的 LLM OS 博客" (Claude searches for it)If URL is provided:
articles/[Article Title]/snapshot.html for link rot prevention.If pasted text is provided:
If title/description is provided:
Determine the article title. Use the original title of the article. For social media posts without a clear title, create a descriptive one: [Author] on [Topic] ([Platform], [Date]). Examples: LLM OS, The Bitter Lesson, Karpathy on AutoResearch (Twitter/X, 2026-03).
Create the article folder at articles/[Article Title]/.
Fill the frontmatter. Use the exact schema from references/article-frontmatter.md. Key rules:
tags: always include article as the first tagsource_url: critical — always include the original URL if availablesource_type: one of blog, tech-report, white-paper, social-media, newsletterplatform: specific platform name (e.g., personal blog, 微信公众号, 小红书, Medium)topics: same as papers — plain text matching topics/ filenamesrecommended_papers: wikilinks to papers the article mentions. Check papers/ for existing titles.Write the 5-section article note. Follow the template in references/article-note-template.md:
Scale depth to article length: a 小红书 post might produce a 200-word note, a long-form blog post 600-800 words.
Weave wikilinks throughout the note. Same as paper ingest — at least 3-5 wikilinks:
[[Topic Name]] for each topic in the frontmatter[[Paper Title|display text]] for papers the article references that exist in the vault[[Article Title|display text]] for other articles in the vault if relatedClassify into topics. Suggest 1-3 topics from existing topics/. Articles may also suggest papers worth ingesting — flag these to the user.
Present a summary to the user:
Input: A topic name (must match a file in topics/), or --all to compile every topic.
Compile reads all papers and articles tagged with that topic and updates the topic synthesis file. The goal is NOT to rewrite from scratch every time — it's to incrementally improve the synthesis as new papers are added.
Load the topic file from topics/[Topic Name].md.
Find all papers and articles for this topic. Scan papers/ and articles/ for all .md files where frontmatter topics contains this topic name (case-insensitive match).
Read each paper and article note. For papers, focus on sections 3 (Problem Statement), 6 (Key Idea), 7 (Approach), 8 (Contributions), 9 (Results), 10 (Discussion). For articles, focus on sections 1 (Core Argument), 2 (Key Insights), 4 (Related Papers). You don't need to re-read the full PDFs or source pages — the notes are the source of truth.
Bonus: section deep-read notes. If a paper has a sections/ subdirectory, scan the section notes for their 关键 Insight callouts — these are pre-extracted, high-quality insights that can significantly enrich the topic synthesis. Particularly valuable for building framework comparison tables and identifying cross-paper patterns.
Surface key insights before writing. First, scan the topic file's 相关论文与资源 section and identify all entries with a 🆕 marker — these are papers/articles ingested since the last compile. Then present to the user:
Ask: "Anything specific to emphasize or de-emphasize?" Wait for response before proceeding. Skip this step only if the user explicitly says to compile autonomously.
Update the topic file. Preserve the existing structure:
---
title: "Topic: [Name]"
tags: [topic, topic-slug]
date: [today]
related_topics: ["[[Other Topic]]", ...]
---
# [Topic Name]
> [!abstract] 主题概述
> [2-4 sentences overview — what this topic is about, why it matters,
> key papers driving it]
---
## 核心问题 (Core Questions)
[Key open questions with paper references]
---
## Insight Synthesis:[Framework/Analysis Title]
[Cross-paper analysis — the heart of the topic file]
[Use tables for framework comparisons]
[Use ASCII diagrams for architecture patterns]
[Include subsections for tensions/debates]
---
## 相关论文与资源
[Wikilinks to papers and articles in this topic, grouped if useful]
[Use 📄 for papers, 📝 for articles]
---
## 与其他主题的关联
[Links to related topics with explanation]
5b. Remove 🆕 markers. After updating the Insight Synthesis, scan the 相关论文与资源 section and remove the 🆕 prefix from all entries in this topic file. The 🆕 marker was added during ingest to flag papers/articles whose insights had not yet been integrated into the synthesis. Once compile completes, all entries in this topic have been synthesized, so all 🆕 markers should be removed. For example, - 🆕 📄 [[Paper|...]] becomes - 📄 [[Paper|...]].
Backlink audit. After updating the topic:
[[Topic Name]]? If not, add one at the first relevant mention.related_topics field and the "与其他主题的关联" section reflect current connections?Update Topics Index.md. Update the paper count for this topic. If the mermaid diagram's connections changed, update those too.
Append to kb-log.md:
## [YYYY-MM-DD] compile | [Topic Name] ([N] papers, [word_count] words)
- New since last compile: [list of 🆕-marked entries that were just integrated]
Always log which entries were newly integrated (i.e., had the 🆕 marker before this compile). This creates an audit trail of what changed between compiles.
When --all is specified, iterate through every .md file in topics/ (excluding Topics Index.md). Compile each topic in sequence. At the end, do a global update of Topics Index.md with all counts.
A query answers a research question by reading the vault — never from general knowledge. The answer is then filed back so the exploration compounds.
Identify relevant topics. Scan topics/ for topic files whose content relates to the question. Read their 主题概述 and Insight Synthesis sections.
Identify relevant papers and articles. From the topic files and from grepping frontmatter in both papers/ and articles/, find sources that address the question. For papers, read sections 3, 6, 7, 8, 9, 10. For articles, read sections 1, 2, 4.
Leverage section deep-read notes when available. If a relevant paper has a sections/ subdirectory, check the section notes for finer-grained insights:
[!tip] callouts) across section notes — these are pre-extracted, transferable insights that directly answer many research questions.[!quote] callouts) for precise, citable evidence with page references.Section notes are especially valuable for methodology questions (check NN-Methodology.md) and experimental details (check NN-Experiments.md). When citing insights from section notes, still attribute to the parent paper: [[Paper Title]].
Synthesize the answer. Properties:
[[Paper Title]] or [[Topic Name]] citationMatch format to question type:
Save the answer to queries/[YYYY-MM-DD] [Question Slug].md with frontmatter:
---
title: "[Question]"
tags: [query, relevant-tags]
date: YYYY-MM-DD
topics: [Topic1, Topic2]
informed_by:
- "[[Paper Title 1]]"
- "[[Paper Title 2]]"
- "[[Article Title 1]]"
---
Consider promoting. If the answer is a durable synthesis (comparison table, trade-off analysis, new concept), suggest to the user that it could become a new topic or be merged into an existing topic file.
Append to kb-log.md:
## [YYYY-MM-DD] query | [Question Slug]
Run a health check across the entire vault. Report issues and propose fixes.
Orphan papers/articles — entries in papers/ or articles/ whose topics field is empty or contains only topics that don't exist in topics/.
Dead wikilinks in topic files — topic files that reference papers or articles that don't exist in papers/ or articles/.
Dead wikilinks in notes — paper/article notes that link to [[Topic Name]] where no such file exists in topics/.
Frontmatter gaps — papers missing critical fields (title, authors, year, topics, status, tags); articles missing critical fields (title, author, source_url, topics, tags).
Stale topic counts — Topics Index.md counts don't match actual Dataview-style query results (count both papers and articles).
Missing backlinks — papers/articles assigned to a topic (via frontmatter topics) but not listed in that topic's 相关论文与资源 section.
Unlinked entries in topics — topic files list papers/articles in 相关论文与资源 that aren't in those entries' topics frontmatter (asymmetric link).
Dead article URLs — articles whose source_url returns 404 or is unreachable (optional, only if user requests deep lint). Suggest saving a snapshot if none exists.
For each issue, print the problem and a proposed fix. Group by severity:
Ask the user: "Should I auto-fix these issues?" Apply fixes only with permission.
Append to kb-log.md:
## [YYYY-MM-DD] lint | [N] issues found, [M] fixed
Show recent operations from kb-log.md.
grep "^## \[" kb-log.md | tail -20
Support filters:
/kb log ingest — show only ingest events/kb log compile — show only compile events/kb log 2026-04 — show events from April 2026The vault uses a bilingual (中英双语) style. Every paper note section is written twice — Chinese first, then English — within the same section. This ensures the vault is useful for both Chinese-native and English-native readers.
## 1. Background / Motivation(面向大众读者))## Insight Synthesis, ## 相关论文与资源)[[Memory in the Age of AI Agents|Hu 2026]]# Knowledge Base LogSection embed for deep-read papers: If step 5b created section notes, add an Obsidian embed at the end of each corresponding section in the main note:
> [!abstract]- 📖 深度精读笔记
> ![[sections/03-Methodology]]
Use a collapsed callout so the main note stays scannable, but readers can expand to see the full deep-read inline.
Update topic files. For each topic in the article's topics field, add a wikilink in the topic file's 相关论文与资源 section. Prefix the entry with 🆕 to mark it as not-yet-compiled. Use the format: - 🆕 📝 [[Article Title|Short Description]] — one-line summary. The 🆕 marker signals that this entry's insights have NOT yet been integrated into the topic's Insight Synthesis section. The marker is removed during compile (see Procedure 2, Step 5b).
Append to kb-log.md:
## [YYYY-MM-DD] ingest article | [Article Title] ([platform])
Target length: 1500-2500 words for the synthesis body (not counting paper lists and frontmatter). This is the intellectual core of the knowledge base — invest the effort here.
What makes a good synthesis (look at topics/Agent Memory.md as the gold standard):
A compile that merely lists "Paper A says X, Paper B says Y, Paper C says Z" is a failure. The value is in the cross-cutting analysis that no single paper provides. Think of yourself as a researcher writing a mini-survey for a colleague who needs to get up to speed on this topic in 10 minutes.