Name: Coralogix Analysis
Author: incidentfox

Coralogix Analysis | Skills Pool

STATISTICS → SAMPLE → SIGNATURES → CORRELATE

python .claude/skills/observability-coralogix/scripts/get_statistics.py [--service SERVICE] [--app APP] [--time-range MINUTES]

# Examples:
python .claude/skills/observability-coralogix/scripts/get_statistics.py --time-range 60
python .claude/skills/observability-coralogix/scripts/get_statistics.py --service payment --app otel-demo

python .claude/skills/observability-coralogix/scripts/sample_logs.py --strategy STRATEGY [--service SERVICE] [--app APP]

# Strategies:
#   errors_only   - Only ERROR/CRITICAL logs (default for incidents)
#   around_anomaly - Logs within time window of specific timestamp
#   first_last    - First N/2 + last N/2 logs (timeline view)
#   random        - Random sample across time range
#   all           - All severity levels (use sparingly)

# Examples:
python .claude/skills/observability-coralogix/scripts/sample_logs.py --strategy errors_only --service payment
python .claude/skills/observability-coralogix/scripts/sample_logs.py --strategy around_anomaly --timestamp "2026-01-27T05:00:00Z" --window 60
python .claude/skills/observability-coralogix/scripts/sample_logs.py --strategy first_last --service checkout --limit 50

python .claude/skills/observability-coralogix/scripts/extract_signatures.py --service SERVICE [--severity SEVERITY] [--max-signatures N]

# Examples:
python .claude/skills/observability-coralogix/scripts/extract_signatures.py --service payment --severity ERROR
python .claude/skills/observability-coralogix/scripts/extract_signatures.py --app otel-demo --max-signatures 30

python .claude/skills/observability-coralogix/scripts/list_services.py [--time-range MINUTES]

python .claude/skills/observability-coralogix/scripts/get_health.py <service> [--time-range MINUTES]

python .claude/skills/observability-coralogix/scripts/get_errors.py <service> [--app APPLICATION] [--time-range MINUTES]

python .claude/skills/observability-coralogix/scripts/query_logs.py "<dataprime_query>" [--time-range MINUTES] [--limit N]

# Equality (use == not =)
$l.subsystemname == 'api-server'

# Severity - use ENUM values (no quotes!)
# Valid: VERBOSE, DEBUG, INFO, WARNING, ERROR, CRITICAL
$m.severity == ERROR
$m.severity == WARNING || $m.severity == ERROR

# Text search (case-insensitive) - use ~~ not 'contains'
$d ~~ 'timeout'
$d ~~ 'connection refused'

# Combine filters with &&
$l.subsystemname == 'payment' && $m.severity == ERROR

# Count
| aggregate count() as total

# Group by field
| groupby $l.subsystemname aggregate count() as cnt

# Time bucketing
| timebucket 5m aggregate count() as cnt

# Multiple aggregations
| groupby $l.subsystemname aggregate count() as cnt, avg($d.duration) as avg_duration

# Order and limit
| orderby cnt desc | limit 20

source logs | groupby $l.subsystemname aggregate count() as cnt | orderby cnt desc | limit 30

source logs | filter $m.severity == ERROR | groupby $l.subsystemname aggregate count() as errors | orderby errors desc

source logs | filter $m.severity == ERROR | groupby $m.timestamp / 5m as bucket aggregate count() as errors | orderby bucket asc

source logs | filter $l.subsystemname == 'payment' | filter $m.severity == ERROR | limit 50

source logs | filter $d ~~ 'connection refused' | limit 20

# Wrong - treats as nested path
$d.kubernetes.namespace

# Correct - literal field name with dot
$d['kubernetes.namespace']
$d['resource.attributes.k8s_pod_name']

# Count logs in last hour vs older
source logs | countby if($m.timestamp > now() - 1h, 'last_hour', 'older')

# Find logs older than 5 minutes
source logs | filter $m.timestamp < now() - 5m

source logs
| choose resource.attributes.k8s_container_restart_count:number as restarts,
         resource.attributes.k8s_container_name as container,
         resource.attributes.k8s_deployment_name as deployment
| filter restarts > 0
| groupby deployment aggregate max(restarts) as max_restarts
| orderby max_restarts desc

source logs
| filter $m.severity == ERROR
| groupby $m.timestamp / 10m as bucket aggregate count() as cnt
| orderby cnt desc
| limit 5

# Search all fields for text
source logs | filter $d ~~ 'connection refused'

# Or use wildfind
source logs | wildfind 'timeout'

┌─────────────────────────────────────────────────────────────┐
│ 1. STATISTICS FIRST (mandatory)                              │
│    python get_statistics.py --service <service>              │
│    → Know volume, error rate, top patterns, anomalies        │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
                     Dominant Issue?
               ┌─────────────┴─────────────┐
               │                           │
      YES (>80% one pattern)               NO (mixed errors)
               │                           │
               ▼                           ▼
┌─────────────────────────────┐  ┌───────────────────────────────────────────┐
│ 2. FAST PATH                │  │ 2. DEEP DIVE                              │
│    Sample errors directly   │  │    python extract_signatures.py           │
│    python sample_logs.py    │  │    python sample_logs.py --strategy ...   │
│    → Verify hypothesis      │  │    → Cluster and analyze patterns         │
└─────────────────────────────┘  └───────────────────────────────────────────┘

# Step 1: Statistics first - ALWAYS
python .claude/skills/observability-coralogix/scripts/get_statistics.py --service payment --time-range 60
# Output: 15,432 logs, 847 errors (5.5%), top pattern: "Connection timeout to downstream"

# IF dominant pattern found:
# Step 2: Verify with samples
python .claude/skills/observability-coralogix/scripts/sample_logs.py --strategy errors_only --service payment --limit 10

Goal	Command
Start investigation	`get_statistics.py --service X`
See error variety	`extract_signatures.py --service X`
Sample errors only	`sample_logs.py --strategy errors_only --service X`
Investigate spike	`sample_logs.py --strategy around_anomaly --timestamp T`
Timeline view	`sample_logs.py --strategy first_last --service X`
List all services	`list_services.py`
Custom query	`query_logs.py "source logs

Use Case	Tool
"What errors happened?"	Logs (`get_statistics.py`)
"Why is this request slow?"	Traces (`get_slow_spans.py`)
"Where did the request fail?"	Traces (`get_traces.py`)
"What's the service dependency?"	Traces (operation analysis)

# Get spans for a service
python .claude/skills/observability-coralogix/scripts/get_traces.py --service checkout --time-range 30

# Get all spans for a trace ID
python .claude/skills/observability-coralogix/scripts/get_traces.py --trace-id abc123def456

# Filter by operation
python .claude/skills/observability-coralogix/scripts/get_traces.py --operation "/api/checkout" --service checkout

# Find spans slower than 500ms
python .claude/skills/observability-coralogix/scripts/get_slow_spans.py --min-duration 500

# Find slow spans in specific service
python .claude/skills/observability-coralogix/scripts/get_slow_spans.py --min-duration 200 --service checkout

# Get latency statistics by service (recommended first step)
python .claude/skills/observability-coralogix/scripts/get_slow_spans.py --stats

# List spans for a service (use serviceName, not $l.subsystemname)
source spans | filter serviceName == 'checkout' | limit 50

# Find slow spans (duration in MICROSECONDS)
source spans | filter duration > 500000 | orderby duration desc | limit 20

# Get all spans for a trace (use top-level traceID)
source spans | filter traceID == 'abc123def456...' | limit 100

# Latency statistics by service
source spans | groupby serviceName aggregate avg(duration) as avg_dur, max(duration) as max_dur | orderby avg_dur desc

┌─────────────────────────────────────────────────────────────┐
│ 1. CHECK LATENCY STATS                                       │
│    python get_slow_spans.py --stats                          │
│    → See which services have high latency                    │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. FIND SLOW SPANS                                           │
│    python get_slow_spans.py --min-duration 500 --service X   │
│    → Get specific slow spans with trace IDs                  │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. TRACE FULL REQUEST                                        │
│    python get_traces.py --trace-id <id>                      │
│    → See all spans in the slow request                       │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. CORRELATE WITH LOGS                                       │
│    python sample_logs.py --strategy around_anomaly           │
│    → Get logs around the same timestamp                      │
└─────────────────────────────────────────────────────────────┘

Coralogix Analysis

Authentication

Coralogix Analysis

Authentication

MANDATORY: Statistics-First Investigation

Available Scripts

PRIMARY INVESTIGATION SCRIPTS

get_statistics.py - ALWAYS START HERE

sample_logs.py - Strategic Sampling

extract_signatures.py - Pattern Clustering

UTILITY SCRIPTS

list_services.py - Service Discovery

get_health.py - Quick Health Check

get_errors.py - Quick Error Fetch

query_logs.py - Raw DataPrime Queries

DataPrime Syntax Quick Reference

Filters

Aggregations

Common Fields

Common Query Patterns

1. List all services with log counts

2. Error count by service

3. Error rate over time

4. Errors for specific service

5. Search for specific error message

Advanced DataPrime Patterns

Bracket Notation for Special Fields

Time-Based Comparisons

K8s Container Restarts

Peak Error Window

Fuzzy Search All Fields

Anti-Patterns to Avoid

Investigation Workflow

Standard Incident Investigation

Example: Payment Service Investigation

Quick Commands Reference

Trace Investigation

When to Use Traces vs Logs

Trace Scripts

get_traces.py - Find Spans

get_slow_spans.py - Latency Analysis

DataPrime Spans Syntax

Span Fields Reference (different from logs!)

Trace Investigation Workflow

Grafana Dashboards

KPI Dashboard Design

Openclaw Secret Scanning Maintainer

Bluebubbles

Session Logs

Openclaw Qa Testing