Query and analyze Telegram message history for knowledge base building
Stenographer logs Telegram messages to a local SQLite database and provides CLI tools to query, filter, and export them. Use it to build structured knowledge bases from chat history.
# Discovery — run these first
stenographer resolve # Cache display names (run once, then periodically)
stenographer chats # List all chats with names and message counts
stenographer contacts # List people with DM history
# Querying
stenographer query recent --since 24h --resolve-names --format table
stenographer query recent --chat <ID> --since 7d --resolve-names --format jsonl
stenographer query recent --sender <ID> --since 7d --resolve-names
stenographer query recent --search "keyword" --since 7d --resolve-names
stenographer query sql "SELECT ..." --format table
# Filter management
stenographer allowlist add <chat_id> # Start tracking a channel/supergroup
stenographer allowlist list
stenographer blacklist add <chat_id> # Stop tracking a chat
stenographer blacklist list
Telegram has three chat types in the database:
stenographer allowlist add <id>.When smart_whitelist = true (default), stenographer automatically whitelists channels/supergroups when it detects a message from someone you have DM history with. This means small work groups with known colleagues get tracked automatically.
IDs are numeric by default. To get human-readable names:
stenographer resolve once to cache names from Telegram API--resolve-names flag on queries to include chat_name and sender_name fieldsAlways start by understanding what data is available:
# What chats exist and how active are they?
stenographer chats
# Who do I talk to most?
stenographer contacts
# What are the top groups by activity?
stenographer query sql "SELECT m.chat_id, c.title, COUNT(*) as msgs FROM messages m LEFT JOIN chats c ON c.chat_id = m.chat_id AND c.chat_type = m.chat_type WHERE m.chat_type IN ('chat','channel') GROUP BY m.chat_id ORDER BY msgs DESC"
Never pull all messages at once. Always scope by time and/or chat:
# Last 24 hours from a specific chat
stenographer query recent --chat 4885425477 --since 24h --resolve-names --format jsonl
# Last week from a specific person
stenographer query recent --sender 185795959 --since 7d --resolve-names
# Search across all chats
stenographer query recent --search "audit" --since 7d --resolve-names
When building context about a project, query multiple related chats:
# Query each related chat separately for cleaner data
stenographer query recent --chat 5135987240 --since 7d --resolve-names --format jsonl > sablier_rox.jsonl
stenographer query recent --chat 5185348661 --since 7d --resolve-names --format jsonl > sablier_setup.jsonl
To understand someone's full context, query their DMs AND group messages:
# Their DMs with you
stenographer query recent --chat <user_id> --since 7d --resolve-names
# Their messages in all groups
stenographer query recent --sender <user_id> --since 7d --resolve-names
# Find all chats a person participates in
stenographer query sql "SELECT DISTINCT m.chat_id, c.title, m.chat_type FROM messages m LEFT JOIN chats c ON c.chat_id = m.chat_id AND c.chat_type = m.chat_type WHERE m.sender_id = 185795959 ORDER BY c.title" --format table
# Daily message volume per chat
stenographer query sql "SELECT date(date, 'unixepoch') as day, chat_id, COUNT(*) as msgs FROM messages GROUP BY day, chat_id ORDER BY day DESC, msgs DESC LIMIT 50" --format table
# Find topics discussed in a chat
stenographer query sql "SELECT message_text FROM messages WHERE chat_id = 4885425477 AND message_text != '' ORDER BY date DESC LIMIT 50"
# People overlap between two groups
stenographer query sql "SELECT DISTINCT a.sender_id, c.title as name FROM messages a JOIN messages b ON a.sender_id = b.sender_id LEFT JOIN chats c ON c.chat_id = a.sender_id AND c.chat_type = 'user' WHERE a.chat_id = 5135987240 AND b.chat_id = 5185348661"
| Format | Use Case |
|---|---|
jsonl | Default. Best for programmatic processing — one JSON object per line |
json | Full JSON array. Good for reading entire result sets |
table | Human-readable. Best for quick inspection in terminal |
csv | Spreadsheet import or further data processing |
Use --fields to select specific columns:
stenographer query recent --since 24h --format csv --fields "date,chat_id,sender_id,message_text" --resolve-names
stenographer chats and stenographer contacts to understand the landscapeknowledge/
people/ # One file per person
projects/ # One file per project/workstream
groups/ # One file per Telegram group (maps to project or team)
daily/ # Optional daily digests
For each person in stenographer contacts:
# Get their DM history
stenographer query recent --chat <user_id> --since 30d --resolve-names --format jsonl
# Get their group participation
stenographer query sql "SELECT DISTINCT m.chat_id, c.title FROM messages m LEFT JOIN chats c ON c.chat_id = m.chat_id AND c.chat_type = m.chat_type WHERE m.sender_id = <user_id> AND m.chat_type != 'user'" --format table
Create a note with:
For each active group in stenographer chats:
# Get recent discussion
stenographer query recent --chat <chat_id> --since 7d --resolve-names --format jsonl
# Get participants
stenographer query sql "SELECT DISTINCT m.sender_id, c.title as name, c.username FROM messages m LEFT JOIN chats c ON c.chat_id = m.sender_id AND c.chat_type = 'user' WHERE m.chat_id = <chat_id>" --format table
Create a note with:
People discuss the same topics in DMs and groups. To avoid duplication:
[[Person Name]] and [[Group Name]] Obsidian links# 1. Find related chats
stenographer query sql "SELECT m.chat_id, c.title, COUNT(*) as msgs FROM messages m LEFT JOIN chats c ON c.chat_id = m.chat_id AND c.chat_type = m.chat_type WHERE m.message_text LIKE '%sablier%' GROUP BY m.chat_id ORDER BY msgs DESC" --format table
# 2. Get discussion from the main group
stenographer query recent --chat 5185348661 --since 30d --resolve-names --format jsonl
# 3. Get related DMs from key participants
stenographer query recent --chat 185795959 --search "sablier" --since 30d --resolve-names
# 4. Get the setup/ops chat
stenographer query recent --chat 5135987240 --since 30d --resolve-names --format jsonl
stenographer resolve before querying with --resolve-names — it caches names and makes subsequent queries instant--since with relative durations (24h, 7d, 30d) rather than absolute dates--search flag is case-insensitive substring match by default. Use --search-fuzzy for LIKE pattern matchingstenographer query sql for complex joins and aggregations that query recent can't expressdate field is Unix timestamp (seconds). Use date(date, 'unixepoch') in SQL for human datesis_outgoing field (0/1) distinguishes messages you sent vs received--exclude-chat to filter out noisy bots or announcement channels from broad queries