Name: Promql Generator
Author: strmt7

Skills suchen.../

Promql Generator | Skills Pool

AskUserQuestion:
- Question: "Confirm the alert threshold?"
- Options:
  1. "500ms (as specified)" - Use the threshold from your request
  2. "Different threshold" - Let me specify a different value

## PromQL Query Plan

Based on your requirements, here's what the query will do:

**Goal**: [Describe the monitoring goal in plain English]

**Query Structure**:
1. Start with metric: `[metric_name]`
2. Filter by labels: `{label1="value1", label2="value2"}`
3. Apply function: `[function_name]([metric][time_range])`
4. Aggregate: `[aggregation] by ([label_list])`
5. Additional operations: [any calculations, ratios, or transformations]

**Expected Output**:
- Data type: [instant vector/scalar]
- Labels in result: [list of labels]
- Value represents: [what the number means]
- Typical range: [expected value range]

**Example Interpretation**:
If the query returns `0.05`, it means: [plain English explanation]

**Does this match your intentions?**
- If yes, I'll generate the query and validate it
- If no, let me know what needs to change

Identify the query category first (histogram, RED, USE, function-specific, optimization, etc.).
Read only the relevant reference section(s) using the Read tool:
- For histogram queries → Read references/metric_types.md (Histogram section)
- For error/latency patterns → Read references/promql_patterns.md (RED method section)
- For resource monitoring → Read references/promql_patterns.md (USE method section)
- For optimization questions → Read references/best_practices.md
- For specific functions → Read references/promql_functions.md
- Re-read a section only if requirements changed or you have not consulted it yet in the current thread.
If a needed reference cannot be read, state the issue and continue with best-effort generation using the most applicable documented pattern you already have.

Cite the applicable pattern or best practice in your response:

As documented in references/promql_patterns.md (Pattern 3: Latency Percentile):
# 95th percentile latency
histogram_quantile(0.95, sum by (le) (rate(...)))

Reference example files when generating similar queries:

Based on examples/red_method.promql (lines 64-82):
# P95 latency with proper histogram_quantile usage

Always Use Label Filters

# Good: Specific filtering reduces cardinality
rate(http_requests_total{job="api-server", environment="prod"}[5m])

# Bad: Matches all time series, high cardinality
rate(http_requests_total[5m])

Use Appropriate Functions for Metric Types

# Counter: Use rate() or increase()
rate(http_requests_total[5m])

# Gauge: Use directly or with *_over_time()
memory_usage_bytes
avg_over_time(memory_usage_bytes[5m])

# Histogram: Use histogram_quantile()
histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)

Apply Aggregations with by() or without()

# Aggregate by specific labels (keeps only these labels)
sum by (job, endpoint) (rate(http_requests_total[5m]))

# Aggregate without specific labels (removes these labels)
sum without (instance, pod) (rate(http_requests_total[5m]))

Use Exact Matches Over Regex When Possible

# Good: Faster exact match
http_requests_total{status_code="200"}

# Bad: Slower regex match when not needed
http_requests_total{status_code=~"200"}

Calculate Ratios Properly

# Error rate: errors / total requests
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

Use Recording Rules for Complex Queries
- If a query is used frequently or is computationally expensive
- Pre-aggregate data to reduce query load
- Follow naming convention: level:metric:operations

Format for Readability

# Good: Multi-line for complex queries
histogram_quantile(0.95,
  sum by (le, job) (
    rate(http_request_duration_seconds_bucket{job="api-server"}[5m])
  )
)

# Requests per second
rate(http_requests_total{job="api-server"}[5m])

# Total requests per second across all instances
sum(rate(http_requests_total{job="api-server"}[5m]))

# Error ratio (0 to 1)
sum(rate(http_requests_total{job="api-server", status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total{job="api-server"}[5m]))

# Error percentage (0 to 100)
(
  sum(rate(http_requests_total{job="api-server", status_code=~"5.."}[5m]))
  /
  sum(rate(http_requests_total{job="api-server"}[5m]))
) * 100

# 95th percentile latency
histogram_quantile(0.95,
  sum by (le) (
    rate(http_request_duration_seconds_bucket{job="api-server"}[5m])
  )
)

# Current memory usage
process_resident_memory_bytes{job="api-server"}

# Average CPU usage over 5 minutes
avg_over_time(process_cpu_seconds_total{job="api-server"}[5m])

# Percentage of up instances
(
  count(up{job="api-server"} == 1)
  /
  count(up{job="api-server"})
) * 100

# Average queue length
avg_over_time(queue_depth{job="worker"}[5m])

# Maximum queue depth in the last hour
max_over_time(queue_depth{job="worker"}[1h])

After generating the query, automatically invoke:
Skill(devops-skills:promql-validator)

The devops-skills:promql-validator skill will:
1. Check syntax correctness
2. Validate semantic logic (correct functions for metric types)
3. Identify anti-patterns and inefficiencies
4. Suggest optimizations
5. Explain what the query does
6. Verify it matches user intent

## PromQL Validation Results

### Syntax Check
- Status: ✅ VALID / ⚠️ WARNING / ❌ ERROR / ⚠️ UNVERIFIED
- Issues: [list any syntax errors]

### Best Practices Check
- Status: ✅ OPTIMIZED / ⚠️ CAN BE IMPROVED / ❌ HAS ISSUES / ⚠️ UNVERIFIED
- Issues: [list any problems found]
- Suggestions: [list optimization opportunities]

### Validation Coverage
- Validator tool run: [successful / failed / unavailable]
- Checks completed: [syntax, semantics, anti-patterns, performance, intent-match]
- Checks skipped: [list any skipped checks, or "None"]

### Query Explanation
- **What it measures**: [plain English description]
- **Output labels**: [list labels in result, or "None (scalar)"]
- **Expected result structure**: [instant vector / scalar / etc.]

The Final Query:
```
[Generated and validated PromQL query]
```
Query Explanation:
- What the query measures
- How to interpret the results
- Expected value range
- Labels in the output
How to Use It:
- For Dashboards: Copy into Grafana/Prometheus UI panel query
- For Alerts: Integrate into Alertmanager rule with threshold
- For Recording Rules: Add to Prometheus recording rule config
- For Ad-hoc: Run directly in Prometheus expression browser
Customization Notes:
- Time ranges that might need adjustment
- Labels to modify for different environments
- Threshold values to tune
- Alternative functions if requirements change
Related Queries:
- Suggest complementary queries
- Mention recording rule opportunities
- Recommend dashboard panels

# Classic histogram (requires _bucket suffix and le label)
histogram_quantile(0.95,
  sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
)

# Native histogram (simpler - no _bucket suffix, no le label needed)
histogram_quantile(0.95,
  sum by (job) (rate(http_request_duration_seconds[5m]))
)

# Get observation count rate from native histogram
histogram_count(rate(http_request_duration_seconds[5m]))

# Get sum of observations from native histogram
histogram_sum(rate(http_request_duration_seconds[5m]))

# Calculate fraction of observations between two values
histogram_fraction(0, 0.1, rate(http_request_duration_seconds[5m]))

# Average request duration from native histogram
histogram_sum(rate(http_request_duration_seconds[5m]))
/
histogram_count(rate(http_request_duration_seconds[5m]))

# prometheus.yml - Enable native histogram scraping
scrape_configs:
  - job_name: 'my-app'
    scrape_native_histogram: true  # Prometheus 3.x+

# prometheus.yml - Convert classic histograms to NHCB on scrape

Promql Generator

PromQL Query Generator

Overview

When to Use This Skill

Promql Generator

PromQL Query Generator

Overview

When to Use This Skill

Interactive Query Planning Workflow

Stage 1: Understand the Monitoring Goal

Stage 2: Identify Available Metrics

Stage 3: Determine Query Parameters

Pre-confirmation for User-Provided Parameters

Stage 4: Present the Query Plan

Stage 5: Generate the PromQL Query

IMPORTANT: Consult Reference Files Before Generating

Best Practices for Query Generation

Common Query Patterns

Stage 6: Validate the Generated Query

Stage 7: Provide Usage Instructions

Native Histograms (Prometheus 3.x+)

Native vs Classic Histogram Syntax

Native Histogram Functions

Detecting Native vs Classic Histograms

Custom Bucket Native Histograms (NHCB)

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid