Name: Catc Troubleshoot
Author: automateyournetwork

SkillsPool

Buscar habilidades.../

Contenido de la habilidad

Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

All Catalyst Center tool calls use this invocation pattern:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT

Variable shorthand used throughout this document:

CATC_CMD="CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT"

How to Call Tools

Use the $MCP_CALL protocol handler to invoke MCP tools:

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" TOOL_NAME 'ARGS_JSON'

Device Unreachable in Catalyst Center
|
+-- Step 1: Confirm the scope
|   How many devices are unreachable?
|   Are they at the same site?
|   |
|   +-- Single device unreachable
|   |   --> Go to "Single Device Investigation"
|   |
|   +-- Multiple devices at same site
|   |   --> Go to "Site-Wide Outage Triage" (Issue 4)
|   |
|   +-- Multiple devices across sites
|       --> Go to "Catalyst Center or Upstream Issue"
|
+-- Step 2: Single Device Investigation
|   |
|   +-- Check device details in CatC
|   +-- Check last update time
|   +-- Check collection status and error code
|   +-- Attempt pyATS connectivity test
|   |
|   +-- pyATS connects successfully?
|   |   |
|   |   +-- YES: CatC polling issue (SNMP/NETCONF credentials, ACL)
|   |   +-- NO: Device is truly unreachable
|   |       |
|   |       +-- Ping from adjacent device (pyATS)
|   |       +-- Check upstream interface state
|   |       +-- Check ARP/MAC table on upstream switch
|   |       +-- Physical layer: power, cable, SFP

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"reachabilityStatus":["Unreachable"]}'

# Get the site hierarchy to understand the topology
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_sites '{}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["UNREACHABLE-SW-01"]}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID-from-step-3>"}'

# Attempt to connect and get basic state
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UNREACHABLE-SW-01","command":"show ip interface brief"}'

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"UPSTREAM-SW-01","command":"ping 10.1.10.1"}'

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show interfaces GigabitEthernet1/0/1"}'

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show arp"}'

Root Cause	CatC Indicators	pyATS Indicators	Resolution
Device powered off	Unreachable, no updates	Connection refused	Check power, PDU, UPS
Management interface down	Unreachable	Cannot SSH	Verify mgmt interface config via console
SNMP credential mismatch	Partial Collection Failure	pyATS connects OK	Fix SNMP community on device or CatC
ACL blocking CatC	Unreachable from CatC	pyATS connects OK	Update ACL to permit CatC management IPs
Routing issue to mgmt subnet	Unreachable	Ping fails from adjacent device	Check routing table, management VRF
Upstream link failure	Multiple devices unreachable	Upstream interface down	Physical layer: cable, SFP, port
Device crash/reload	Recently unreachable	Boot messages in logs	Check `show logging`, `show version` uptime

Client Connectivity Issue
|
+-- Step 1: Identify the client
|   Do we have the MAC address, IP address, or username?
|   |
|   +-- MAC known --> get_client_details_by_mac
|   +-- IP known --> get_clients_list with ipv4_address filter, then get details by MAC
|   +-- Only username/location --> get_clients_list filtered by site
|
+-- Step 2: Is the client visible in CatC?
|   |
|   +-- YES: Client is associating/authenticating
|   |   |
|   |   +-- Check health score
|   |   +-- Wired or wireless?
|   |   |   |
|   |   |   +-- WIRED --> Check connected switch port, VLAN, duplex
|   |   |   +-- WIRELESS --> Check RSSI, SNR, band, AP, SSID
|   |   |
|   |   +-- Check IP assignment (DHCP working?)
|   |   +-- Check connected network device status
|   |
|   +-- NO: Client not visible
|       |
|       +-- Is the access switch/AP reachable?
|       +-- Is the client physically connected (cable/Wi-Fi enabled)?
|       +-- Is 802.1X/MAB failing? (check ISE if available)
|       +-- Is the SSID broadcasting?

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 4 hours"}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"ipv4_address":["10.1.50.42"]}'

# Is the access device reachable?
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["ACC-SW-01"]}'

# Get the switch's interfaces to check the client-facing port
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<switch-UUID>"}'

# Check the specific switchport
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show interfaces GigabitEthernet1/0/15"}'

# Check authentication sessions (802.1X/MAB)
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show authentication sessions"}'

# Check MAC address table for the client
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show mac address-table"}'

# Check DHCP snooping bindings
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show ip dhcp snooping binding"}'

# Check port-security
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show port-security"}'

Root Cause	CatC Indicators	pyATS/CLI Indicators	Resolution
DHCP exhaustion	Client has 169.254.x.x IP	`show ip dhcp pool` shows 0 free	Expand DHCP scope or reduce lease time
Port security violation	Client not visible	`err-disabled` state on port	Clear port-security, investigate
802.1X failure	Client not visible	Authentication failed in ISE	Check credentials, certificate, RADIUS
Wrong VLAN	Client visible but no connectivity	Port in wrong VLAN	Correct VLAN assignment
Duplex mismatch	Poor health score (wired)	Half-duplex on one end	Set both ends to auto or match manually
Wireless: low RSSI	Low health score, low RSSI	N/A (wireless)	Move closer to AP, add AP coverage
Wireless: co-channel	Intermittent drops	N/A (wireless)	Adjust channel plan, reduce AP power
AP overloaded	Multiple clients with issues	N/A (wireless)	Load balance, add AP capacity
Upstream link failure	Multiple clients affected	Interface down on distribution	Physical layer or routing issue

# Find the device first
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-01"]}'

# Fetch interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID>"}'

Interface Type	Impact	Urgency
Uplink to distribution/core	CRITICAL: Affects all downstream devices and clients	Immediate
Inter-switch link (trunk)	HIGH: May break VLAN spanning, STP reconvergence	Immediate
Access port (user-facing)	MEDIUM: Single user affected	Standard
Management interface	HIGH: Loses management access to the device	Urgent
Loopback	HIGH if used for routing (OSPF RID, BGP update-source)	Urgent

# Detailed interface statistics
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces TenGigabitEthernet1/0/1"}'

# Check for recent log messages about the interface
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-01"}'

# Check SFP/transceiver status
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces transceiver"}'

# Check EtherChannel status if applicable
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show etherchannel summary"}'

# Check spanning-tree for blocked ports
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show spanning-tree"}'

Symptom	CatC Data	CLI Investigation	Likely Cause
admin UP, oper DOWN	Status mismatch in CatC	No link pulse on interface	Bad cable, SFP, remote end shut
CRC errors incrementing	Error counters in interface data	`show interfaces` CRC count	Faulty cable, bad SFP, duplex mismatch
Err-disabled	Port not visible or shows errors	`show interfaces status err-disabled`	Port-security, BPDU guard, storm-control
Flapping (up/down cycles)	Multiple status changes	`show logging` UPDOWN messages	Loose cable, auto-negotiation failure
STP blocked	Port up but no traffic	`show spanning-tree` BLK state	STP topology issue, loop detected

Site-Wide Outage
|
+-- Step 1: Quantify the impact
|   How many devices unreachable at this site?
|   How many clients affected?
|   |
|   +-- ALL devices unreachable at site
|   |   --> WAN link failure, upstream router, or site power outage
|   |
|   +-- SOME devices unreachable
|   |   --> Distribution layer or IDF/MDF issue
|   |
|   +-- Devices reachable but clients can't connect
|       --> DHCP, VLAN, wireless controller, or authentication issue
|
+-- Step 2: Identify the failure boundary
|   Which devices ARE still reachable?
|   What is the common upstream for the failed devices?
|
+-- Step 3: Investigate the upstream device
|   Check interfaces, routing, logs on the common upstream
|
+-- Step 4: Resolve and verify
    Fix the issue, confirm all devices come back, verify client counts recover

# Get all devices at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"locationName":["Global/USA/NYC/Floor3"]}'

# Get time range
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 1 hours"}'

# Count clients at the affected site NOW
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_count '{"start_time":1705312800000,"end_time":1705399200000,"site_hierarchy":["Global/USA/NYC/Floor3"]}'

# Compare with a baseline (e.g., same time window yesterday)
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"yesterday"}'

# Check the distribution switch
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-NYC"]}'

# Check its interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<DIST-UUID>"}'

# Check all interfaces on the distribution switch
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip interface brief"}'

# Check routing table -- are routes to the affected site present?
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip route"}'

# Check OSPF/BGP adjacencies
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip ospf neighbor"}'

# Check for recent events in the logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-NYC"}'

# Ping the unreachable access switches from the distribution
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"DIST-SW-NYC","command":"ping 10.1.30.1"}'

Scope	Likely Cause	Key Evidence	Resolution
All devices + all clients	Site power outage	All devices unreachable, no response to pings	Dispatch facilities team, check UPS/PDU
All devices + all clients	WAN link failure	Edge router reachable but uplink down	Check ISP, activate backup WAN if available
One floor/IDF	Distribution switch failure	Dist switch unreachable, access switches behind it down	Power cycle, console access, hardware swap
One floor/IDF	Trunk link failure	Access switches up but no VLAN connectivity	Check trunk port, SFP, cable between access and dist
Devices up but no clients	DHCP failure	Clients getting APIPA addresses	Check DHCP server, ip helper-address, DHCP relay
Devices up, wireless clients down	WLC issue	APs reachable but no SSID broadcast	Check WLC status, AP join status
Devices up, some clients down	VLAN issue	Specific VLAN clients affected	Check VLAN trunking, SVI status

# Extended time range to capture roaming events
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 8 hours"}'

# Get detailed client info with wireless views
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'

# List all APs at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"family":["Unified AP"],"locationName":["Global/USA/NYC/Floor3"]}'

# Count wireless clients per AP to find overloaded APs
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"client_type":"wireless","site_hierarchy":["Global/USA/NYC/Floor3"]}'

Symptom	Likely Cause	Resolution
Client holds onto far AP (sticky client)	802.11k/v not enabled or client doesn't support it	Enable Optimized Roaming, BSS Transition Management
Client drops during roam	802.11r (FT) not enabled, or PMK caching disabled	Enable Fast Transition (FT), enable CCKM/PMK caching
Client roams to 2.4 GHz	Band steering not aggressive enough	Increase band steering threshold, check client capability
Dead zone between APs	RF coverage gap	Add AP, increase power (carefully), adjust antenna
Client bounces between 2 APs	Equal signal from both APs	Reduce power on one AP, adjust cell boundaries

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Partial Collection Failure"]}'

CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Could Not Synchronize"]}'

Troubleshooting Report
=======================
Catalyst Center: $CCC_HOST
Timestamp: YYYY-MM-DD HH:MM UTC
Ticket/Case: [reference number if applicable]

Problem Statement
-----------------
[What was reported, by whom, when it started]

Impact Assessment
-----------------
Devices affected: X unreachable out of Y total at site
Clients affected: ~Z clients (estimated from client count delta)
Sites affected: [list]
Severity: CRITICAL / HIGH / MEDIUM / LOW

Investigation Timeline
-----------------------
1. [HH:MM] Checked CatC device reachability -- found X devices unreachable
2. [HH:MM] Identified site-localized issue at Global/USA/NYC/Floor3
3. [HH:MM] Checked distribution switch DIST-SW-NYC -- reachable
4. [HH:MM] Found TenGig1/0/1 (uplink to Floor3 IDF) down/down
5. [HH:MM] pyATS logs show %LINK-3-UPDOWN at 14:23 UTC
6. [HH:MM] SFP transceiver showing rx power below threshold

Root Cause
----------
Failing SFP transceiver on DIST-SW-NYC TenGigabitEthernet1/0/1
(uplink to Floor3 access layer IDF)

Resolution
----------
[Steps taken to resolve, or escalation path if unresolved]

Verification
-----------
- All Floor3 access switches returned to Reachable in CatC
- Client count at Floor3 recovered to baseline (335 clients)
- No further interface flaps observed in 30-minute monitoring window

Preventive Measures
--------------------
- Schedule optical monitoring for all uplink SFPs
- Add redundant uplink to Floor3 IDF (single point of failure identified)

python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"input":{"role":"assistant","content":"Troubleshooting: Site-wide outage at Global/USA/NYC/Floor3. Root cause: Failing SFP on DIST-SW-NYC Te1/0/1. Impact: 335 clients, 5 access switches unreachable for 47 minutes. Resolution: SFP replaced, services restored. Preventive: Redundant uplink recommended.","artifacts":[]}}'

Catc Troubleshoot | Skills Pool

Catc Troubleshoot

Catc Troubleshoot

Catalyst Center Troubleshooting Workflows

Catalyst Center MCP Server

How to Call Tools

When to Use

Troubleshooting Principles

Issue 1: Device Unreachable

Decision Tree

Step 1: Identify All Unreachable Devices

Step 2: Check If It Is Site-Localized

Step 3: Inspect the Specific Device

Step 4: Check Interfaces on the Unreachable Device (If CatC Has Cached Data)

Step 5: Escalate to pyATS for Live Validation

Common Root Causes: Device Unreachable

Issue 2: Client Connectivity Problems

Decision Tree

Step 1: Get Time Range

Step 2: Look Up the Client

Step 3: Analyze Client Data

Step 4: Check the Access Device

Step 5: Escalate to pyATS for Switch-Port Level Troubleshooting

Common Root Causes: Client Connectivity

Issue 3: Interface Down Analysis

Step 1: Get All Interfaces on the Device

Step 2: Identify Problem Interfaces

Step 3: Classify the Interface

Step 4: Escalate to pyATS for Detailed Interface Diagnostics

Interface Down Root Causes

Issue 4: Site-Wide Outage Triage

Decision Tree

Step 1: Assess Site Device Status

Step 2: Check Client Impact

Step 3: Find the Failure Boundary

Step 4: Escalate to pyATS for the Reachable Upstream

Site Outage Root Causes

Issue 5: Wireless Client Roaming Issues

Step 1: Get Client History

Step 2: Analyze Roaming Indicators

Step 3: Check AP Coverage at the Problem Area

Roaming Issue Root Causes

Issue 6: Catalyst Center Collection or API Issues

Check Collection Failures

API Time Range Errors

Troubleshooting Report Format

GAIT Audit Trail

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags