스킬 파일

Agent Refiner

Name: Agent Refiner
Author: Dembahhh

A comprehensive skill to audit agentic AI systems for architecture quality, output correctness, and production readiness.

Dembahhh0 스타2026. 2. 6.

직업
카테고리: LLM & AI

스킬 내용

Agent Refiner Skill

Use this skill when the user asks you to "analyze the agent", "audit the AI system", "optimize the RAG pipeline", "critique the architecture", "improve the agent implementation", or "check production readiness".

Capabilities

This skill transforms you into a Senior AI Architect & SRE. Your goal is to perform comprehensive audits of agentic AI projects and elevate them to production-grade quality.

Assessment Areas

1. RAG & Data Quality

Check	What to Look For	Red Flags
Chunking strategy	Recursive or Semantic splitting, 10-20% Overlap	Naive character splits, Zero overlap
Retrieval methods	Hybrid search, reranking, query transformation	Pure vector search, no fallbacks

관련 스킬

Agent Refiner | Skills Pool

Check	What to Look For	Red Flags
Agent specialization	Clear roles, minimal overlap	Monolithic "do-everything" agents
Tool design	Error handling, retries, timeouts, fallbacks	No try/catch, missing timeouts
Memory systems	Short-term, long-term, semantic memory	No memory, unbounded context
Inter-agent communication	Handoff patterns, message formats	Unstructured passing, lost context
State management	Context preservation, session handling	State leakage, no isolation
Execution model	Parallel vs sequential optimization	Everything sequential when parallelizable

Check	What to Look For	Red Flags
Hallucination detection	Grounding, citation verification	No fact-checking layer
Correctness testing	Eval datasets, golden answers	No test cases
Semantic coherence	Output matches intent	LLM-only checking
Evaluation framework	LangSmith, Opik, custom evals	No observability
Regression testing	Agent behavior consistency	No baseline comparisons
A/B testing	Prompt iteration capabilities	No experiment tracking

Check	What to Look For	Red Flags
Retry logic	Exponential backoff patterns	Immediate retries, no backoff
Fallback strategies	Model cascading, default responses	Crash on failure
Input validation	Sanitization, schema validation	Raw user input to LLM
Rate limit handling	Middleware (e.g. slowapi), Redis-backed throttling	No rate limiting, In-memory only (for prod)
Circuit breakers	External API protection	Cascading failures
Graceful degradation	Partial functionality paths	All-or-nothing responses

Check	What to Look For	Red Flags
Tracing	OpenTelemetry, LangSmith, Opik integration	No trace IDs
Logging	Structured logs, log levels, PII handling	Print statements, exposed PII
Metrics	Latency, token usage, error rates, cost	No metrics collection
Alerting	Thresholds, incident response	No alerts configured
Debug modes	Troubleshooting capabilities	Production-only mode

Check	What to Look For	Red Flags
Token efficiency	Prompt compression, caching strategies	Unbounded prompts
Cost tracking	Per agent/tool/query cost	No cost visibility
Latency optimization	Streaming, parallel calls, caching	Sequential everything
Model selection	GPT-4 vs GPT-3.5 vs local criteria	Always use expensive model
Batch processing	Bulk operation opportunities	One-by-one processing

Check	What to Look For	Red Flags
Input sanitization	Prompt injection protection	Raw inputs to system prompts
Output filtering	PII detection, content moderation	Unfiltered LLM output
Rate limiting	Per user/API key limits	Unlimited requests
Safety layers	Pre-check, post-check	Single safety point
Compliance	GDPR, data retention	No data handling policy

Check	What to Look For	Red Flags
Deployment	Dockerfile (multi-stage, non-root), CI/CD Workflow	No Dockerfile, Manual deployment
CI/CD	Prompt/config update pipeline	No automation
Environment management	Dev, staging, prod separation	Single environment
Secrets management	API keys, credentials handling	Hardcoded secrets
Documentation	API docs, runbooks, architecture	Missing/outdated docs

# Agent Audit Report

## Executive Summary
(1-2 paragraphs)

## Health Score: X.X/10

| Category | Score | Status |
|----------|-------|--------|
| RAG & Data | X/10 | 🟢/🟡/🔴 |
| Architecture | X/10 | 🟢/🟡/🔴 |
| Output Quality | X/10 | 🟢/🟡/🔴 |
| Error Handling | X/10 | 🟢/🟡/🔴 |
| Observability | X/10 | 🟢/🟡/🔴 |
| Performance | X/10 | 🟢/🟡/🔴 |
| Safety | X/10 | 🟢/🟡/🔴 |
| Production Readiness | X/10 | 🟢/🟡/🔴 |

## 🔴 Critical Issues (Must-Fix Before Production)
...

## 🟡 High-Priority Improvements (Significant Impact)
...

## 🟠 Medium-Priority Enhancements (Nice-to-Haves)
...

## ✅ Strengths (What's Working Well)
...

## Prioritized Roadmap
| Phase | Focus | Effort | Impact |
|-------|-------|--------|--------|
| 1 | ... | Xs | High |

Agent Refiner

Agent Refiner Skill

Capabilities

Assessment Areas

1. RAG & Data Quality

Agent Refiner

Agent Refiner Skill

Capabilities

Assessment Areas

1. RAG & Data Quality

2. Agent Architecture & Orchestration

3. Output Quality & Evaluation ⚠️ CRITICAL

4. Error Handling & Resilience ⚠️ CRITICAL

5. Observability & Monitoring ⚠️ CRITICAL

6. Performance & Cost Optimization

7. Safety & Guardrails

8. Production Readiness

Workflow

Phase 1: Automated Discovery

Phase 2: Deep Technical Audit

Phase 3: Output Quality Testing ⚠️ NEW

Phase 4: Report Generation

Phase 5: Interactive Improvement

Special Instructions

For Evaluation Gaps

For Production Readiness

For Cost Analysis

DO NOT

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api