Import content into local RAG. Triggers: "add to RAG", "scrape docs", "import YouTube", "embed PDF", "build knowledge base", "embedda sito", "importa video", "aggiungi PDF", "crea knowledge base".
Import websites, YouTube videos, and PDFs into AnythingLLM.
| Source | Detection | Tool |
|---|---|---|
| YouTube | youtube.com, youtu.be | yt-dlp-server:8501 |
.pdf extension | pdftotext | |
| Docs site | /sitemap.xml exists | Crawl4AI sitemap |
| Generic | Everything else | Crawl4AI BFS |
| MCP Tool | Purpose |
|---|---|
mcp__crawl4ai__md | Scrape page to markdown |
mcp__anythingllm__list_workspaces | List workspaces |
mcp__anythingllm__embed_text | Add content to workspace |
mcp__anythingllm__chat_with_workspace |
| Query RAG (mode: "query") |
# Verify containers
docker ps | grep -E "crawl4ai|anythingllm"
# Start if needed
docker start crawl4ai anythingllm
# YouTube/audio support
docker start yt-dlp-server whisper-server
For "Client not initialized" error:
mcp__anythingllm__initialize_anythingllm
apiKey: "YOUR_API_KEY"
baseUrl: "http://localhost:3001"
1. Detect content type from URL
2. Find or create workspace (name from domain)
3. Crawl content (max 3 parallel)
4. Embed with metadata
5. Report results with test query
| URL | Workspace |
|---|---|
| docs.anthropic.com | anthropic-docs |
| react.dev/learn | react-dev |
| example.com/blog | example-blog |
---