Convert any file format (Excel, PPT, Word, PDF, CSV) to Markdown and import into Gbrain knowledge base. Auto-classifies content into brain directories (people/companies/projects/concepts/meetings/media). 将任意格式文件转换为 Markdown 并导入 Gbrain 知识库,自动分类识别。
| pandas |
| Text | .txt, .md | direct import |
Run this once before first use:
chmod +x ~/.claude/skills/AnyFile2Gbrain/setup.sh
~/.claude/skills/AnyFile2Gbrain/setup.sh
When user provides a file path:
~/brain/<category>/)gbrain sync --repo ~/brain && gbrain embed --staleAnalyze filename and content to determine target directory:
| Keywords | Directory | Example |
|---|---|---|
| name, profile, bio, resume, CV | people/ | John-Profile.xlsx → people/john.md |
| company, corp, inc, startup, org | companies/ | Acme-Financials.xlsx → companies/acme.md |
| meeting, notes, call, discussion, sync | meetings/ | Team-Meeting.pptx → meetings/team-meeting.md |
| idea, concept, theory, insight, brainstorm | concepts/ | Feature-Idea.docx → concepts/feature-idea.md |
| article, book, paper, summary, review | media/ | AI-Trends.pdf → media/ai-trends.md |
| (default) | projects/ | Q1-Report.xlsx → projects/q1-report.md |
Slug generation: lowercase, hyphens, remove special chars
python3 -c "
import pandas as pd
import sys
file = '$FILE_PATH'
xlsx = pd.ExcelFile(file)
md = ''
for sheet in xlsx.sheet_names:
df = pd.read_excel(xlsx, sheet_name=sheet)
md += f'## Sheet: {sheet}\n\n'
md += df.fillna('').to_markdown(index=False)
md += '\n\n'
print(md)
" > /tmp/converted.md
python3 -c "
from pptx import Presentation
import sys
prs = Presentation('$FILE_PATH')
md = ''
for slide_num, slide in enumerate(prs.slides, 1):
md += f'## Slide {slide_num}\n\n'
for shape in slide.shapes:
if hasattr(shape, 'text') and shape.text.strip():
md += shape.text.strip() + '\n\n'
for shape in slide.shapes:
if shape.has_table:
table = shape.table
rows = []
for row in table.rows:
rows.append([cell.text.strip() for cell in row.cells])
if rows:
header = '| ' + ' | '.join(rows[0]) + ' |'
separator = '| ' + ' | '.join(['---'] * len(rows[0])) + ' |'
body = '\n'.join(['| ' + ' | '.join(r) + ' |' for r in rows[1:]])
md += header + '\n' + separator + '\n' + body + '\n\n'
print(md)
" > /tmp/converted.md
pandoc '$FILE_PATH' -t markdown --wrap=none > /tmp/converted.md
Or fallback with python-docx:
python3 -c "
from docx import Document
doc = Document('$FILE_PATH')
md = ''
for para in doc.paragraphs:
style = para.style.name.lower()
text = para.text.strip()
if not text:
continue
if 'heading 1' in style:
md += f'# {text}\n\n'
elif 'heading 2' in style:
md += f'## {text}\n\n'
elif 'heading 3' in style:
md += f'### {text}\n\n'
else:
md += text + '\n\n'
for table in doc.tables:
rows = [[cell.text.strip() for cell in row.cells] for row in table.rows]
if rows:
header = '| ' + ' | '.join(rows[0]) + ' |'
separator = '| ' + ' | '.join(['---'] * len(rows[0])) + ' |'
body = '\n'.join(['| ' + ' | '.join(r) + ' |' for r in rows[1:]])
md += '\n' + header + '\n' + separator + '\n' + body + '\n\n'
print(md)
" > /tmp/converted.md
pandoc '$FILE_PATH' -t markdown --wrap=none > /tmp/converted.md 2>/dev/null || \
python3 -c "
import pdfplumber
with pdfplumber.open('$FILE_PATH') as pdf:
md = ''
for page in pdf.pages:
text = page.extract_text()
if text:
md += text + '\n\n'
print(md)
" > /tmp/converted.md
python3 -c "
import pandas as pd
df = pd.read_csv('$FILE_PATH')
print(df.fillna('').to_markdown(index=False))
" > /tmp/converted.md
SLUG=$(basename '$FILE_PATH' | sed 's/\.[^.]*$//' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/^-//;s/-$//')
CATEGORY="<determined-category>"
cat > ~/brain/$CATEGORY/$SLUG.md << 'EOF'
---