Skill-Datei

Heteromind

Name: Heteromind
Author: openclaw

Unified heterogeneous knowledge QA system. Automatically routes natural language queries to SQL databases, Knowledge Graphs, or table files using 4-layer detection (rule-based, LLM semantic, schema matching, entity verification). Supports multi-LLM providers and bilingual queries. Trigger on data queries, "how many", "show", aggregations, filters, joins, or structured information requests.

openclaw4,189 Sterne15.04.2026

Beruf
Kategorien: Data Engineering

Skill-Inhalt

Unified heterogeneous knowledge QA system with automatic source detection and multi-stage reasoning.

Core Concept

Natural language queries are automatically routed to the appropriate knowledge source (SQL, Knowledge Graph, or Table files) without requiring users to specify the data source. A 4-layer detection architecture ensures accurate source identification, followed by multi-stage query generation with self-revision and voting.

User Query → Source Detection (4 layers) → Query Generation → Self-Revision → Voting → Execution → Answer

When to Use

Trigger	Action
"How many employees in X?"	NL2SQL engine
"Who is the founder of X?"	NL2SPARQL engine (KG)
"Which quarter had highest sales?"	TableQA engine
"Show average salary by department"

Verwandte Skills

Heteromind | Skills Pool

Layer 1 (15%): Rule-Based
  - 20+ keywords per source type
  - 7 regex patterns (aggregation, comparison, relation)
  - Fast pre-filtering

Layer 2 (35%): LLM Semantic
  - Intent classification
  - Entity/predicate detection
  - Multi-hop identification

Layer 3a (25%): SQL Schema Match
  - Inverted index on tables/columns
  - Automatic JOIN inference
  - Confidence scoring

Layer 3b (25%): KG Entity Link
  - Entity mention extraction
  - SPARQL endpoint lookup
  - Predicate pattern matching

Layer 3c (25%+30%): Entity Verification
  - Cross-source entity existence check
  - 30% score boost for verified entities

Layer 4: Multi-Source Fusion
  - Weighted aggregation
  - Execution plan generation

1. Schema/Entity Linking     → Identify relevant tables/columns/entities
2. Parallel Generation       → Generate 3 candidates concurrently
3. Multi-Round Revision      → 2 rounds of self-review
4. Validation               → Syntax and semantic checks
5. Voting                   → Select best candidate
6. Execution                → Run query
7. Result Verification      → Validate reasonableness

from src.engines.nl2sql.multi_stage_engine import MultiStageNL2SQLEngine

engine = MultiStageNL2SQLEngine({
    "name": "sql_engine",
    "schema": schema,
    "llm_config": {
        "model": "deepseek-chat",
        "api_key": "sk-...",
    },
    "generation_config": {
        "num_candidates": 3,
        "max_revisions": 2,
        "parallel_generation": True,
    },
})

result = await engine.execute("How many employees in Engineering?", {})

from src.engines.nl2sparql.multi_stage_engine import MultiStageNL2SPARQLEngine

engine = MultiStageNL2SPARQLEngine({
    "name": "sparql_engine",
    "endpoint_url": "https://dbpedia.org/sparql",
    "ontology": ontology,
    "llm_config": {"model": "gpt-4", "api_key": "sk-..."},
})

result = await engine.execute("Who founded Microsoft?", {})

from src.engines.table_qa.multi_stage_engine import MultiStageTableQAEngine

engine = MultiStageTableQAEngine({
    "name": "table_engine",
    "table_path": "data/sales.csv",
    "llm_config": {"model": "deepseek-chat", "api_key": "sk-..."},
})

result = await engine.execute("Which quarter had highest sales?", {})

# Initialize with default
engine = MultiStageNL2SQLEngine({
    "llm_config": {"model": "deepseek-chat", "api_key": "sk-deepseek-key"},
})

# Override per-call
result = await engine.execute(
    query="Complex query",
    context={},
    model="gpt-4-turbo",      # Override model
    api_key="sk-openai-key",  # Override API key
)

Provider	Models	Configuration
DeepSeek	deepseek-chat	`base_url: https://api.deepseek.com/v1`
OpenAI	gpt-4, gpt-3.5-turbo	Default endpoint
Azure OpenAI	gpt-4	`base_url: https://{resource}.openai.azure.com`
Local (Ollama)	llama2, mistral	`base_url: http://localhost:11434/v1`

llm_config:
  model: deepseek-chat
  api_key: sk-...
  base_url: https://api.deepseek.com/v1  # Optional
  temperature: 0.1
  max_tokens: 500
  timeout: 30

generation_config:
  num_candidates: 3           # SQL/SPARQL candidates to generate
  max_revisions: 2            # Self-revision rounds
  parallel_generation: true   # Concurrent candidate generation
  voting_enabled: true        # Multi-candidate voting

Heteromind

Core Concept

When to Use

Heteromind

Core Concept

When to Use

Architecture

4-Layer Source Detection

Query Generation Pipeline

Engines

NL2SQL Engine

NL2SPARQL Engine

TableQA Engine

Multi-LLM Support

Supported Providers

Configuration

LLM Configuration

Generation Configuration

Source Detection Weights

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns