스킬 파일

Splunk Analysis

Name: Splunk Analysis
Author: incidentfox

Splunk log analysis using SPL (Search Processing Language). Use when investigating issues via Splunk logs, saved searches, or alerts.

incidentfox559 스타2026. 2. 19.

직업
카테고리: 데이터 분석

스킬 내용

Authentication

IMPORTANT: Credentials are injected automatically by a proxy layer. Do NOT check for SPLUNK_HOST, SPLUNK_TOKEN, or other credentials in environment variables - they won't be visible to you. Just run the scripts directly; authentication is handled transparently.

MANDATORY: Statistics-First Investigation

NEVER dump raw logs. Always follow this pattern:

STATISTICS → SAMPLE → PATTERNS → CORRELATE

Statistics First - Know volume, error rate, and top patterns before sampling
Strategic Sampling - Choose the right strategy based on statistics
Pattern Extraction - Cluster similar errors to find root causes
Context Correlation - Investigate around anomaly timestamps

Available Scripts

관련 스킬

Splunk Analysis | Skills Pool

python .claude/skills/observability-splunk/scripts/get_statistics.py [--index INDEX] [--sourcetype SOURCETYPE] [--time-range MINUTES]

# Examples:
python .claude/skills/observability-splunk/scripts/get_statistics.py --time-range 60
python .claude/skills/observability-splunk/scripts/get_statistics.py --index main
python .claude/skills/observability-splunk/scripts/get_statistics.py --sourcetype access_combined

python .claude/skills/observability-splunk/scripts/sample_logs.py --strategy STRATEGY [--index INDEX] [--sourcetype SOURCETYPE] [--limit N]

# Strategies:
#   errors_only   - Only error logs (default for incidents)
#   warnings_up   - Warning and error logs
#   around_time   - Logs around a specific timestamp
#   all           - All log levels

# Examples:
python .claude/skills/observability-splunk/scripts/sample_logs.py --strategy errors_only --index main
python .claude/skills/observability-splunk/scripts/sample_logs.py --strategy around_time --timestamp "2026-01-27T05:00:00" --window 5
python .claude/skills/observability-splunk/scripts/sample_logs.py --strategy all --sourcetype access_combined --limit 20

# Simple keyword search
error

# Index specific search (ALWAYS specify index for performance)
index=main error

# Multiple keywords (implicit AND)
index=main error connection

# Exact phrase
index=main "connection refused"

# Exact field match
index=main host=web-01

# Wildcard
index=main host=web-*

# Numeric comparison
index=main status>=400

# NOT operator
index=main NOT status=200

# OR operator
index=main (status=500 OR status=503)

# Relative time (in tool call)
earliest=-15m latest=now

# Absolute time
earliest="01/15/2024:10:00:00" latest="01/15/2024:11:00:00"

# Natural time modifiers
earliest=-1h@h  # 1 hour ago, rounded to hour
earliest=-1d@d  # 1 day ago, rounded to day

┌─────────────────────────────────────────────────────────────┐
│ 1. STATISTICS FIRST (mandatory)                              │
│    python get_statistics.py --index <index>                  │
│    → Know volume, error rate, top patterns                   │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
                     High Error Rate?
               ┌─────────────┴─────────────┐
               │                           │
       YES (>5%)                           NO
               │                           │
               ▼                           ▼
┌─────────────────────────────┐  ┌───────────────────────────────────────────┐
│ 2. FAST PATH                │  │ 2. TARGETED INVESTIGATION                 │
│    Sample errors directly   │  │    Filter by specific criteria            │
│    python sample_logs.py    │  │    python sample_logs.py --strategy all   │
│    --strategy errors_only   │  │    → Look for anomalies                   │
└─────────────────────────────┘  └───────────────────────────────────────────┘

Goal	Command
Start investigation	`get_statistics.py --index X`
Sample errors only	`sample_logs.py --strategy errors_only --index X`
Investigate spike	`sample_logs.py --strategy around_time --timestamp T`
All logs	`sample_logs.py --strategy all --index X --limit 20`

Command	Purpose	Example
`stats`	Aggregate statistics	`stats count by host`
`timechart`	Time-based aggregation	`timechart span=5m count`
`chart`	Pivot table	`chart count by status, host`
`top`	Top values	`top 10 host`
`rare`	Rare values	`rare message`
`table`	Select fields	`table _time, host, message`

Command	Purpose	Example
`eval`	Calculate fields	`eval duration_sec=duration/1000`
`rex`	Regex extraction	`rex field=message "error: (?<error_type>\w+)"`
`rename`	Rename fields	`rename src_ip as source_ip`
`fields`	Include/exclude fields	`fields host, message`

# Error count per 5 minutes
index=main | timechart span=5m count(eval(level="ERROR")) as errors, count as total

# Error percentage over time
index=main
| timechart span=5m count(eval(level="ERROR")) as errors, count as total
| eval error_rate=errors/total*100

index=main level=ERROR
| stats count by service, message
| sort -count
| head 20

index=main sourcetype=access_combined
| stats avg(response_time) as avg_rt,
        p95(response_time) as p95_rt,
        max(response_time) as max_rt
    by uri_path
| sort -avg_rt

# Sudden spike detection
index=main
| timechart span=5m count as events
| eventstats avg(events) as avg_events, stdev(events) as stdev_events
| eval anomaly=if(events > avg_events + 2*stdev_events, 1, 0)
| where anomaly=1

Command	Purpose	Example
`search`	Filter events	`search error`
`where`	Filter with expressions	`where status > 400`
`dedup`	Remove duplicates	`dedup host`
`head`	First N results	`head 10`
`tail`	Last N results	`tail 10`

Splunk Analysis

Authentication

MANDATORY: Statistics-First Investigation

Available Scripts

Splunk Analysis

Authentication

MANDATORY: Statistics-First Investigation

Available Scripts

PRIMARY INVESTIGATION SCRIPTS

get_statistics.py - ALWAYS START HERE

sample_logs.py - Strategic Sampling

SPL (Search Processing Language)

Basic Search

Field Searches

Time Range

Investigation Workflow

Standard Incident Investigation

Quick Commands Reference

SPL Commands Reference

Filtering Commands

Transformation Commands

Field Operations

Common Query Patterns

Error Rate Analysis

Top Errors by Service

Response Time Analysis

Anomaly Detection

Anti-Patterns to Avoid

Grafana Dashboards

KPI Dashboard Design

Openclaw Secret Scanning Maintainer

Bluebubbles

Session Logs

Openclaw Qa Testing