Name: Meta Cognitive Tool Optimization
Author: hiyenwong

搜索技能.../

Meta Cognitive Tool Optimization | Skills Pool

Traditional Agent Behavior:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────→│   Agent     │────→│   Tool      │
│  (Image +   │     │  (Reflexive │     │  (Always    │
│   Question) │     │   Invoke)   │     │   Called)   │
└─────────────┘     └─────────────┘     └─────────────┘

Problem: Tool invoked even when answer is obvious from image!

Meta-Cognitive Agent:
┌─────────────┐     ┌─────────────────┐     ┌─────────────┐
│   Query     │────→│  Meta-Cognitive │────→│   Tool      │
│  (Image +   │     │   Arbitration   │     │  (Conditional│
│   Question) │     │  (Assess First) │     │   Call)     │
└─────────────┘     └─────────────────┘     └─────────────┘
         │                  │
         │                  ↓ (if needed)
         │           ┌─────────────┐
         └──────────→│   Internal  │
                     │   Reasoning │
                     │   (Direct   │
                     │   Answer)   │
                     └─────────────┘

Benefit: Tool only called when necessary!

class ToolUseDecision:
    """Represents the decision to use tools or internal reasoning."""
    
    USE_TOOL = "use_tool"
    USE_INTERNAL = "use_internal"
    UNCERTAIN = "uncertain"
    
    def __init__(self, decision: str, confidence: float, reasoning: str):
        self.decision = decision
        self.confidence = confidence
        self.reasoning = reasoning

class MetaCognitiveArbitrator:
    """Arbitrates between tool use and internal reasoning."""
    
    def __init__(self, model, tools_available: list):
        self.model = model
        self.tools = tools_available
        self.confidence_threshold = 0.7
    
    def assess_need_for_tools(
        self, 
        query: str, 
        visual_context: Optional[Image],
        available_tools: list
    ) -> ToolUseDecision:
        """
        Assess whether tools are needed for this query.
        
        Args:
            query: User's question or request
            visual_context: Optional image context
            available_tools: List of available tools
        
        Returns:
            ToolUseDecision with decision and confidence
        """
        # Construct assessment prompt
        assessment_prompt = self._build_assessment_prompt(
            query, visual_context, available_tools
        )
        
        # Get model's self-assessment
        assessment = self.model.generate(
            assessment_prompt,
            output_schema={
                "needs_tool": "bool",
                "confidence": "float",
                "reasoning": "str"
            }
        )
        
        # Make decision based on confidence
        if assessment["confidence"] >= self.confidence_threshold:
            if assessment["needs_tool"]:
                return ToolUseDecision(
                    ToolUseDecision.USE_TOOL,
                    assessment["confidence"],
                    assessment["reasoning"]
                )
            else:
                return ToolUseDecision(
                    ToolUseDecision.USE_INTERNAL,
                    assessment["confidence"],
                    assessment["reasoning"]
                )
        else:
            return ToolUseDecision(
                ToolUseDecision.UNCERTAIN,
                assessment["confidence"],
                assessment["reasoning"]
            )
    
    def _build_assessment_prompt(
        self, 
        query: str, 
        visual_context: Optional[Image],
        tools: list
    ) -> str:
        """Build prompt for meta-cognitive assessment."""
        return f"""
You are a meta-cognitive arbitrator for an AI agent. Your task is to decide 
whether this query requires external tools or can be answered from the 
available context.

Query: {query}

Available Tools:
{self._format_tools(tools)}

{'Visual Context: [Image provided]' if visual_context else 'No visual context'}

Instructions:
1. Analyze whether the query can be answered from the visual context alone
2. Determine if any available tool is necessary to answer accurately
3. Consider: Would a human need external information to answer this?

Respond with:
- needs_tool: true if external tool is necessary, false if answerable from context
- confidence: 0.0-1.0 confidence in your assessment
- reasoning: Brief explanation of your decision
        """

class DeliberateToolUseTrainer:
    """Trains agents for deliberate tool-use policies."""
    
    def __init__(self, base_model, tool_set: list):
        self.model = base_model
        self.tools = tool_set
        self.training_data = []
    
    def generate_training_examples(self, scenarios: list) -> list:
        """
        Generate training examples with deliberate tool-use labels.
        
        Args:
            scenarios: List of (query, context, optimal_decision) tuples
        
        Returns:
            Training examples with meta-cognitive labels
        """
        examples = []
        
        for query, context, optimal in scenarios:
            example = {
                "query": query,
                "context": context,
                "available_tools": self._format_tool_descriptions(),
                "meta_cognitive_assessment": {
                    "needs_tool": optimal["needs_tool"],
                    "confidence": optimal["confidence"],
                    "reasoning": optimal["reasoning"]
                },
                "action": optimal["action"]
            }
            examples.append(example)
        
        return examples
    
    def train_with_curriculum(self, examples: list, epochs: int = 3):
        """
        Train model with curriculum learning for deliberate tool use.
        
        Phase 1: Clear-cut examples (obvious tool/no-tool cases)
        Phase 2: Ambiguous examples (borderline cases)
        Phase 3: Full distribution
        """
        # Sort by confidence (clear-cut first)
        sorted_examples = sorted(
            examples, 
            key=lambda x: abs(x["meta_cognitive_assessment"]["confidence"] - 0.5),
            reverse=True
        )
        
        # Curriculum phases
        phase_size = len(sorted_examples) // 3
        phases = [
            sorted_examples[:phase_size],  # Clear-cut
            sorted_examples[phase_size:2*phase_size],  # Medium
            sorted_examples[2*phase_size:]  # Ambiguous
        ]
        
        for phase_idx, phase_examples in enumerate(phases):
            print(f"Training Phase {phase_idx + 1}: {len(phase_examples)} examples")
            self._train_phase(phase_examples, epochs)
    
    def _train_phase(self, examples: list, epochs: int):
        """Train on a specific curriculum phase."""
        for epoch in range(epochs):
            for example in examples:
                # Train meta-cognitive assessment
                self._train_assessment(example)
                # Train action execution
                self._train_action(example)
    
    def _train_assessment(self, example: dict):
        """Train meta-cognitive assessment capability."""
        # Implementation: Fine-tune on assessment prediction
        pass
    
    def _train_action(self, example: dict):
        """Train action execution capability."""
        # Implementation: Fine-tune on action execution
        pass

class ToolUseEvaluator:
    """Evaluates tool-use efficiency and correctness."""
    
    def __init__(self):
        self.metrics = {
            "unnecessary_tool_calls": 0,
            "missed_tool_calls": 0,
            "correct_arbitration": 0,
            "total_queries": 0,
            "latency_savings_ms": 0
        }
    
    def evaluate_decision(
        self,
        query: str,
        context: dict,
        agent_decision: ToolUseDecision,
        ground_truth: str  # "tool_needed" or "internal_sufficient"
    ) -> dict:
        """
        Evaluate a single tool-use decision.
        
        Args:
            query: The user query
            context: Available context (visual, textual)
            agent_decision: Agent's decision
            ground_truth: Correct decision
        
        Returns:
            Evaluation metrics for this decision
        """
        self.metrics["total_queries"] += 1
        
        result = {
            "correct": False,
            "type": None,
            "latency_impact_ms": 0
        }
        
        # Check decision correctness
        if ground_truth == "tool_needed":
            if agent_decision.decision == ToolUseDecision.USE_TOOL:
                self.metrics["correct_arbitration"] += 1
                result["correct"] = True
                result["type"] = "correct_tool_use"
            else:
                self.metrics["missed_tool_calls"] += 1
                result["type"] = "missed_tool_call"
                # Estimate error from not using tool
                result["error"] = "failed_to_retrieve_necessary_info"
        
        else:  # ground_truth == "internal_sufficient"
            if agent_decision.decision == ToolUseDecision.USE_INTERNAL:
                self.metrics["correct_arbitration"] += 1
                result["correct"] = True
                result["type"] = "correct_internal_reasoning"
                # Calculate latency savings
                result["latency_impact_ms"] = -self._estimate_tool_latency()
                self.metrics["latency_savings_ms"] += result["latency_impact_ms"]
            else:
                self.metrics["unnecessary_tool_calls"] += 1
                result["type"] = "unnecessary_tool_call"
                result["latency_impact_ms"] = self._estimate_tool_latency()
        
        return result
    
    def get_summary(self) -> dict:
        """Get evaluation summary."""
        total = self.metrics["total_queries"]
        if total == 0:
            return {}
        
        return {
            "accuracy": self.metrics["correct_arbitration"] / total,
            "unnecessary_tool_rate": self.metrics["unnecessary_tool_calls"] / total,
            "missed_tool_rate": self.metrics["missed_tool_calls"] / total,
            "avg_latency_savings_ms": self.metrics["latency_savings_ms"] / total,
            "total_queries": total
        }
    
    def _estimate_tool_latency(self) -> int:
        """Estimate average tool call latency in milliseconds."""
        return 500  # Typical API call latency

class ConfidenceBasedArbitrator:
    """Uses confidence scores for tool-use decisions."""
    
    def __init__(self, thresholds: dict = None):
        self.thresholds = thresholds or {
            "high_confidence": 0.8,
            "medium_confidence": 0.5,
            "low_confidence": 0.3
        }
    
    def decide(
        self,
        internal_confidence: float,
        tool_confidence: float,
        query_complexity: float
    ) -> ToolUseDecision:
        """
        Decide based on confidence scores.
        
        Args:
            internal_confidence: Confidence in internal reasoning (0-1)
            tool_confidence: Confidence that tool would help (0-1)
            query_complexity: Estimated complexity (0-1)
        
        Returns:
            ToolUseDecision
        """
        # High internal confidence → use internal
        if internal_confidence >= self.thresholds["high_confidence"]:
            return ToolUseDecision(
                ToolUseDecision.USE_INTERNAL,
                internal_confidence,
                "High confidence in internal reasoning"
            )
        
        # Low internal confidence + high tool confidence → use tool
        if (internal_confidence < self.thresholds["medium_confidence"] and 
            tool_confidence >= self.thresholds["high_confidence"]):
            return ToolUseDecision(
                ToolUseDecision.USE_TOOL,
                tool_confidence,
                "Low internal confidence, high tool utility"
            )
        
        # Medium confidence → consider complexity
        if query_complexity > 0.7:
            return ToolUseDecision(
                ToolUseDecision.USE_TOOL,
                tool_confidence * query_complexity,
                "Complex query benefits from tool"
            )
        
        # Default to internal for simple queries
        return ToolUseDecision(
            ToolUseDecision.USE_INTERNAL,
            internal_confidence,
            "Default to internal for simple queries"
        )

# Example training instance
training_example = {
    "query": "What is the capital of France?",
    "visual_context": None,
    "ground_truth": {
        "needs_tool": False,
        "reasoning": "Common knowledge, no tool needed"
    }
}

Metric	Description	Target
Arbitration Accuracy	% of correct tool/internal decisions	>90%
Unnecessary Tool Rate	% of tool calls that were unnecessary	<10%
Missed Tool Rate	% of queries needing tools but not called	<5%
Latency Savings	Average time saved per query	>200ms
Confidence Calibration	Alignment of confidence with accuracy	>0.8

Aspect	Reflexive Tool Use	Deliberate Tool Use
Latency	High (always calls)	Low (selective)
Cost	High	Optimized
Errors	More (unnecessary calls)	Fewer
User Experience	Slower	Faster
Intelligence	Lower	Higher (meta-cognitive)

Meta Cognitive Tool Optimization

Core Problem

The Solution: Deliberate Tool-Use Framework

Meta Cognitive Tool Optimization

Core Problem

The Solution: Deliberate Tool-Use Framework

When to Use

Activation Keywords

Core Concepts

1. The Meta-Cognitive Deficit

2. Deliberate Tool-Use Policy

3. Decision Framework

Implementation Patterns

Pattern 1: Meta-Cognitive Assessment

Pattern 2: Deliberate Tool-Use Training

Pattern 3: Tool-Use Evaluation

Pattern 4: Confidence-Based Arbitration

Training Methodology

Step 1: Data Collection

Step 2: Meta-Cognitive Fine-Tuning

Step 3: Reinforcement Learning

Step 4: Evaluation

Key Metrics

Benefits

Comparison

References

Tools Used

Notes

Instructions for Agents

Examples

Example 1: Basic Usage

Example 2: Advanced Usage

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Meta Cognitive Tool Optimization

Core Problem

The Solution: Deliberate Tool-Use Framework

Meta Cognitive Tool Optimization

Core Problem

The Solution: Deliberate Tool-Use Framework

When to Use

Activation Keywords

Core Concepts

1. The Meta-Cognitive Deficit

2. Deliberate Tool-Use Policy

3. Decision Framework

Implementation Patterns

Pattern 1: Meta-Cognitive Assessment

Pattern 2: Deliberate Tool-Use Training

Pattern 3: Tool-Use Evaluation

Pattern 4: Confidence-Based Arbitration

Training Methodology

Step 1: Data Collection

Step 2: Meta-Cognitive Fine-Tuning

Step 3: Reinforcement Learning

Step 4: Evaluation

Key Metrics

Benefits

Comparison

Related Concepts

References

Tools Used

Notes

Instructions for Agents

Examples

Example 1: Basic Usage

Example 2: Advanced Usage

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api