Name: Debug Cluster
Author: openshift

搵技能.../

Debug Cluster | Skills Pool

# Check NodePool resources in HC namespace
kubectl get nodepool -n <hc-namespace>

# Check CAPI cluster resource status in HCP namespace
kubectl get cluster -n <hcp-namespace> -o yaml

# Check CAPI provider pod logs
kubectl logs -n <hcp-namespace> deployment/capi-provider

# Check CAPI machines status in HCP namespace
kubectl get machines -n <hcp-namespace>
kubectl describe machines -n <hcp-namespace>

# Review HyperShift operator logs for NodePool issues
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i nodepool
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i <cluster-name>

# Check HCP resource status
kubectl get hostedcontrolplane -n <hcp-namespace> -o yaml

# Check pods in HCP namespace
kubectl get pods -n <hcp-namespace>

# Check for stuck pods
kubectl get pods -n <hcp-namespace> --field-selector=status.phase!=Running

# Review control-plane-operator logs
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=100

# Check namespace status
kubectl get namespace <hcp-namespace> -o yaml

# List all remaining resources in namespace
kubectl api-resources --verbs=list --namespaced -o name | \
  xargs -n 1 kubectl get --show-kind --ignore-not-found -n <hcp-namespace>

# Check for resources with finalizers
kubectl get all -n <hcp-namespace> -o json | \
  jq '.items[] | select(.metadata.finalizers != null) | {kind: .kind, name: .metadata.name, finalizers: .metadata.finalizers}'

# Review HO logs for namespace cleanup
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i namespace

# Check HostedCluster status
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o yaml

# Check HostedCluster finalizers
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o jsonpath='{.metadata.finalizers}'

# Review HO logs for HostedCluster deletion
kubectl logs -n hypershift deployment/operator --tail=200 | grep -i "hostedcluster.*<cluster-name>"

# Check if critical HyperShift CRDs exist
kubectl get crd hostedclusters.hypershift.openshift.io
kubectl get crd nodepools.hypershift.openshift.io

# Count HyperShift CRDs (should be ~9)
kubectl get crd | grep hypershift | wc -l

# Count CAPI CRDs (should be ~50)
kubectl get crd | grep cluster.x-k8s.io | wc -l

# You'll need these for reinstallation:
# - OIDC storage provider configuration (provider-specific, see below)
# - Provider credentials (if applicable)
# - Any custom configuration flags used in original installation

hypershift install render | kubectl delete -f -

hypershift install \
  [provider-specific-flags] \
  --enable-defaulting-webhook true

# Check operator is running
kubectl get deploy -n hypershift
kubectl get pods -n hypershift

# Verify CRDs are installed
kubectl get crd | grep hostedcluster
kubectl get crd | grep nodepool
kubectl get crd | grep cluster.x-k8s.io | wc -l

# Test API accessibility
kubectl get hostedclusters -A

# Check operator logs for errors
kubectl logs -n hypershift deployment/operator --tail=50

# Verify controllers are running
kubectl logs -n hypershift deployment/operator --tail=100 | grep "Starting workers"

# List all finalizers on a resource
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.metadata.finalizers}'

# Remove a specific finalizer (use with caution!)
kubectl patch <resource-type> <name> -n <namespace> -p '{"metadata":{"finalizers":null}}' --type=merge

# HyperShift operator logs with context
kubectl logs -n hypershift deployment/operator --tail=500 --timestamps

# Control plane operator logs
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=500 --timestamps

# Follow logs in real-time
kubectl logs -n hypershift deployment/operator -f

# Get events for a specific resource
kubectl describe <resource-type> <name> -n <namespace>

# Get all events in a namespace, sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Check resource conditions
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .

# Check specific condition
kubectl get hostedcluster <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'

Debug Cluster

HyperShift Cluster Debugging Guide

When to Use This Skill

Provider-Specific Troubleshooting

Debug Cluster

HyperShift Cluster Debugging Guide

When to Use This Skill

Provider-Specific Troubleshooting

Key Components to Understand

Resource Hierarchy

Operators

Namespaces

Common Debugging Scenarios

Scenario: Hosted Cluster Stuck Deleting

1. Node Pools Deletion

2. HostedControlPlane Resource Deletion

3. HCP Namespace Deletion

4. HostedCluster Resource Deletion

Quick Debugging Checklist

Common Issues and Resolutions

Issue: NodePool won't delete

Issue: Machines stuck in "Deleting" phase with "WaitingForInfrastructureDeletion"

Issue: HCP namespace stuck in Terminating

Issue: HostedCluster stuck with finalizers

Issue: Control plane pods not terminating

Scenario: HyperShift CRDs Missing or Corrupted

Symptoms:

Check for Missing CRDs:

Resolution: Reinstall HyperShift

Step 1: Gather Required Parameters

Step 2: Completely Uninstall HyperShift

Step 3: Reinstall HyperShift

Step 4: Verify Installation

General Debugging Tips

Checking Finalizers

Operator Logs

Resource Events

Conditions and Status

Additional Resources

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow