Build and analyze knowledge graphs from research literature. Automated pipeline: arxiv search → entity extraction → KG construction → vector embeddings → semantic search → skill pattern extraction. Use when user asks to analyze papers, build research knowledge bases, find related work, or extract reusable patterns from academic literature.
Automated pipeline for building and analyzing knowledge graphs from academic research literature. Integrates arxiv search, entity extraction, vector embeddings, and graph algorithms to discover patterns and extract reusable skill patterns.
exec: Run Python scripts, kg_tool CLI, arxiv API queriesweb_search: Search for related researchweb_fetch: Fetch paper content from arxivread: Read existing skills, KG schemawrite: Store results, update memoryfeishu_bitable_app: Store structured paper metadata (optional)Define research scope:
Search arxiv:
query = f'cat:{category}+AND+all:{keywords}'
url = f'http://export.arxiv.org/api/query?search_query={query}&max_results=10&sortBy=submittedDate&sortOrder=descending'
Parse results: Extract title, authors, abstract, arxiv_id, category, published_date
Initialize KG (if needed):
kg_tool init <db_path>
Add entities:
kg_tool add-entity <db_path> paper <arxiv_id>
Store metadata: JSON properties with title, authors, abstract, category
Generate embeddings:
all-MiniLM-L6-v2 (384 dimensions){title}. {abstract} (max 500 chars)Store vectors:
INSERT INTO kg_vectors (entity_id, vector, dimension, created_at)
VALUES (?, ?, 384, ?)
PageRank: Find important papers
kg_tool pagerank <db_path>
Louvain: Detect research clusters
kg_tool louvain <db_path>
Semantic search: Find related papers
kg_tool search <db_path> <query>
skill-extractor skillskill-creator skill# Research Literature KG Report
**Date**: YYYY-MM-DD
**Topics**: Primary + Secondary
## Statistics
- Papers collected: N
- Entities in KG: M
- Vectors generated: K
## Top Papers (PageRank)
1. [ID] Title - score
2. ...
## Semantic Search Results
### Query: "quantum computing"
- Top matches with similarity scores
## Research Clusters (Louvain)
- Community 0: Topic A papers
- Community 1: Topic B papers
## Extracted Patterns
- Pattern 1: [Description]
- Pattern 2: [Description]
## New Skills Created
- skill-name: Description
-- kg_entities
CREATE TABLE kg_entities (
id INTEGER PRIMARY KEY,
entity_type TEXT, -- 'paper', 'author', 'keyword', 'topic'
name TEXT,
properties TEXT, -- JSON: {title, authors, abstract, category}
created_at INTEGER
);
-- kg_vectors
CREATE TABLE kg_vectors (
entity_id INTEGER,
vector BLOB, -- float32 array
dimension INTEGER, -- 384 for all-MiniLM-L6-v2
created_at INTEGER
);
-- kg_relations
CREATE TABLE kg_relations (
id INTEGER PRIMARY KEY,
source_id INTEGER,
target_id INTEGER,
rel_type TEXT, -- 'cites', 'related', 'same_author'
weight REAL
);
Identify primary topic, secondary topic, and search keywords from user request.
Search arxiv API with category and keyword filters; parse title, authors, abstract, arxiv_id.
Initialize kg.db if needed; add papers as entities; store JSON metadata.
Generate 384-dim embeddings via all-MiniLM-L6-v2; run PageRank and Louvain algorithms.
Use skill-extractor to identify reusable patterns; output Research Literature KG Report.
User: "Build a knowledge graph from recent brain connectivity papers"
Agent:
1. Search arxiv: cat:q-bio.NC AND all:brain connectivity, max 10 results
2. Add papers to kg.db with metadata
3. Generate vector embeddings
4. Run PageRank to find most influential papers
5. Run Louvain to detect research clusters
6. Output report with top papers and patterns
User: "Find quantum finance papers related to portfolio optimization"
Agent:
1. Initialize search query: "quantum portfolio optimization"
2. Fetch papers from arxiv quant-ph category
3. Add to kg.db and generate embeddings
4. Run similarity search for related papers
5. Report top matches with similarity scores
arxiv-search: Paper search detailsskill-extractor: Pattern extractionskill-creator: Skill creationfeishu-bitable: Alternative storage