Name: Skill: source retrieval
Author: Dingxingdi

Skill: source retrieval

Trigger this skill in 'Data Lake' or 'Enterprise Multi-DB' scenarios where the agent must select the single most relevant database from a repository of dozens or hundreds. It is especially critical for 'Domain Collision' requests where multiple databases share similar terminology (e.g., three different 'Product' or 'Customer' databases). Trigger words/phrases: “find which database is the right one”, “search the enterprise data lake”, “there are many overlapping databases, pick the correct one”, “the question might apply to several sources, verify which one can actually answer it”, or “make sure the selected source has a valid path between all the fields mentioned.”

Dingxingdi0 星標2026年4月10日

職業
分類: LLM 同 AI

1. Capability Definition & Real Case

Professional Definition: The ability to perform high-precision database selection (routing) within a multi-database repository by evaluating schema coverage, structural connectivity, and fine-grained semantic alignment. This involves mapping query spans to specific schema entities, verifying that all target entities form a connected subgraph (joinability) within the selected database, and using embedding-based tie-breaking to resolve ambiguities between domain-overlapping sources.
Dimension Hierarchy: Environment Grounding->Retrieval and Alignment->source retrieval

Real Case

[Case 1]

Initial Environment: A massive enterprise environment containing 23 separate databases (ranging from Formula 1 stats to stack-exchange logs) with 319 tables. The agent has no prior knowledge of which database holds which topic and must use exploration tools to find the target.
Real Question: Which driver has the most wins in Formula 1?
Real Trajectory: 1. The agent calls a summary command to see all 23 databases and identifies 'F1_Stats' as the likely source. 2. It then requests a list of tables for 'F1_Stats' and finds 'drivers' and 'race_results'. 3. It requests column metadata for 'drivers' (driver_id, name) and 'race_results' (driver_id, position, wins). 4. It composes a join query to count and rank wins by driver name.

Skill: source retrieval

Dingxingdi0 星標2026年4月10日

職業
分類: LLM 同 AI

1. Capability Definition & Real Case

Professional Definition: The ability to perform high-precision database selection (routing) within a multi-database repository by evaluating schema coverage, structural connectivity, and fine-grained semantic alignment. This involves mapping query spans to specific schema entities, verifying that all target entities form a connected subgraph (joinability) within the selected database, and using embedding-based tie-breaking to resolve ambiguities between domain-overlapping sources.

Dimension Hierarchy: Environment Grounding->Retrieval and Alignment->source retrieval

Real Case

[Case 1]

Initial Environment: A massive enterprise environment containing 23 separate databases (ranging from Formula 1 stats to stack-exchange logs) with 319 tables. The agent has no prior knowledge of which database holds which topic and must use exploration tools to find the target.

Real Question: Which driver has the most wins in Formula 1?

Real Trajectory: 1. The agent calls a summary command to see all 23 databases and identifies 'F1_Stats' as the likely source. 2. It then requests a list of tables for 'F1_Stats' and finds 'drivers' and 'race_results'. 3. It requests column metadata for 'drivers' (driver_id, name) and 'race_results' (driver_id, position, wins). 4. It composes a join query to count and rank wins by driver name.

Skill: source retrieval

1. Capability Definition & Real Case

Real Case

Skill: source retrieval

1. Capability Definition & Real Case

Real Case

Pipeline Execution Instructions

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api