Catalyst Center troubleshooting workflows - device unreachable investigation, client connectivity issues, interface down analysis, site-wide outage triage, wireless roaming problems, integration with pyATS for CLI-level diagnostics. Use when a device is unreachable, a user reports connectivity problems, an interface is down, a site has an outage, or wireless clients have roaming issues.
All Catalyst Center tool calls use this invocation pattern:
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT
Variable shorthand used throughout this document:
CATC_CMD="CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 -u $CATC_MCP_SCRIPT"
Use the $MCP_CALL protocol handler to invoke MCP tools:
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" TOOL_NAME 'ARGS_JSON'
Trigger: Catalyst Center shows one or more devices with reachabilityStatus: Unreachable.
Device Unreachable in Catalyst Center
|
+-- Step 1: Confirm the scope
| How many devices are unreachable?
| Are they at the same site?
| |
| +-- Single device unreachable
| | --> Go to "Single Device Investigation"
| |
| +-- Multiple devices at same site
| | --> Go to "Site-Wide Outage Triage" (Issue 4)
| |
| +-- Multiple devices across sites
| --> Go to "Catalyst Center or Upstream Issue"
|
+-- Step 2: Single Device Investigation
| |
| +-- Check device details in CatC
| +-- Check last update time
| +-- Check collection status and error code
| +-- Attempt pyATS connectivity test
| |
| +-- pyATS connects successfully?
| | |
| | +-- YES: CatC polling issue (SNMP/NETCONF credentials, ACL)
| | +-- NO: Device is truly unreachable
| | |
| | +-- Ping from adjacent device (pyATS)
| | +-- Check upstream interface state
| | +-- Check ARP/MAC table on upstream switch
| | +-- Physical layer: power, cable, SFP
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"reachabilityStatus":["Unreachable"]}'
Record for each device: hostname, managementIpAddress, platformId, locationName, lastUpdated, collectionStatus, errorCode.
# Get the site hierarchy to understand the topology
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_sites '{}'
Group unreachable devices by locationName. If multiple devices at the same site are unreachable, skip to Issue 4 (Site-Wide Outage).
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["UNREACHABLE-SW-01"]}'
Analyze:
lastUpdated -- How long has it been unreachable? Minutes = recent event. Hours/days = chronic issue.collectionStatus -- Could Not Synchronize vs Partial Collection Failure vs ManagederrorCode -- Specific error from Catalyst Center's collection engineCCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID-from-step-3>"}'
NOTE: This returns the last-known interface state. If the device became unreachable recently, this data may still reflect the pre-failure state. Look for interfaces that were already showing errors before the device went dark.
When $PYATS_MCP_SCRIPT and $PYATS_TESTBED_PATH are available and the device is in the pyATS testbed:
# Attempt to connect and get basic state
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UNREACHABLE-SW-01","command":"show ip interface brief"}'
If pyATS connects but CatC cannot reach the device:
If pyATS also cannot connect:
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"UPSTREAM-SW-01","command":"ping 10.1.10.1"}'
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show interfaces GigabitEthernet1/0/1"}'
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"UPSTREAM-SW-01","command":"show arp"}'
| Root Cause | CatC Indicators | pyATS Indicators | Resolution |
|---|---|---|---|
| Device powered off | Unreachable, no updates | Connection refused | Check power, PDU, UPS |
| Management interface down | Unreachable | Cannot SSH | Verify mgmt interface config via console |
| SNMP credential mismatch | Partial Collection Failure | pyATS connects OK | Fix SNMP community on device or CatC |
| ACL blocking CatC | Unreachable from CatC | pyATS connects OK | Update ACL to permit CatC management IPs |
| Routing issue to mgmt subnet | Unreachable | Ping fails from adjacent device | Check routing table, management VRF |
| Upstream link failure | Multiple devices unreachable | Upstream interface down | Physical layer: cable, SFP, port |
| Device crash/reload | Recently unreachable | Boot messages in logs | Check show logging, show version uptime |
Trigger: User reports cannot connect to the network, slow connectivity, or intermittent drops.
Client Connectivity Issue
|
+-- Step 1: Identify the client
| Do we have the MAC address, IP address, or username?
| |
| +-- MAC known --> get_client_details_by_mac
| +-- IP known --> get_clients_list with ipv4_address filter, then get details by MAC
| +-- Only username/location --> get_clients_list filtered by site
|
+-- Step 2: Is the client visible in CatC?
| |
| +-- YES: Client is associating/authenticating
| | |
| | +-- Check health score
| | +-- Wired or wireless?
| | | |
| | | +-- WIRED --> Check connected switch port, VLAN, duplex
| | | +-- WIRELESS --> Check RSSI, SNR, band, AP, SSID
| | |
| | +-- Check IP assignment (DHCP working?)
| | +-- Check connected network device status
| |
| +-- NO: Client not visible
| |
| +-- Is the access switch/AP reachable?
| +-- Is the client physically connected (cable/Wi-Fi enabled)?
| +-- Is 802.1X/MAB failing? (check ISE if available)
| +-- Is the SSID broadcasting?
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 4 hours"}'
By MAC address (preferred):
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'
By IP address:
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"ipv4_address":["10.1.50.42"]}'
For wired clients, check:
connectedNetworkDeviceName -- Which switch is the client on?connectedNetworkDeviceInterfaceName -- Which port?vlanId -- Correct VLAN assignment?ipv4Address -- Valid IP? Or APIPA (169.254.x.x) indicating DHCP failure?healthScore -- Score 0-10. Below 7 = issue.For wireless clients, check:
ssid -- Connected to the correct SSID?band -- 2.4 GHz or 5 GHz?rssi -- Signal strength (see RSSI table below)snr -- Signal-to-noise ratiodataRate -- Current data rate in MbpsconnectedNetworkDeviceName -- Which AP?healthScore -- Client health scoreIf the client's access switch or AP shows issues:
# Is the access device reachable?
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["ACC-SW-01"]}'
# Get the switch's interfaces to check the client-facing port
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<switch-UUID>"}'
When CatC data is insufficient (e.g., you need to check port security, 802.1X session state, or MAC address table):
# Check the specific switchport
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show interfaces GigabitEthernet1/0/15"}'
# Check authentication sessions (802.1X/MAB)
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show authentication sessions"}'
# Check MAC address table for the client
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show mac address-table"}'
# Check DHCP snooping bindings
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show ip dhcp snooping binding"}'
# Check port-security
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"ACC-SW-01","command":"show port-security"}'
| Root Cause | CatC Indicators | pyATS/CLI Indicators | Resolution |
|---|---|---|---|
| DHCP exhaustion | Client has 169.254.x.x IP | show ip dhcp pool shows 0 free | Expand DHCP scope or reduce lease time |
| Port security violation | Client not visible | err-disabled state on port | Clear port-security, investigate |
| 802.1X failure | Client not visible | Authentication failed in ISE | Check credentials, certificate, RADIUS |
| Wrong VLAN | Client visible but no connectivity | Port in wrong VLAN | Correct VLAN assignment |
| Duplex mismatch | Poor health score (wired) | Half-duplex on one end | Set both ends to auto or match manually |
| Wireless: low RSSI | Low health score, low RSSI | N/A (wireless) | Move closer to AP, add AP coverage |
| Wireless: co-channel | Intermittent drops | N/A (wireless) | Adjust channel plan, reduce AP power |
| AP overloaded | Multiple clients with issues | N/A (wireless) | Load balance, add AP capacity |
| Upstream link failure | Multiple clients affected | Interface down on distribution | Physical layer or routing issue |
Trigger: Catalyst Center shows interfaces in admin up / operationally down state, or CatC alerts on interface status changes.
# Find the device first
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-01"]}'
# Fetch interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<UUID>"}'
Filter the interface response for:
adminStatus: "UP" AND status: "down" -- Operationally failed interfaces (admin enabled but link is down)| Interface Type | Impact | Urgency |
|---|---|---|
| Uplink to distribution/core | CRITICAL: Affects all downstream devices and clients | Immediate |
| Inter-switch link (trunk) | HIGH: May break VLAN spanning, STP reconvergence | Immediate |
| Access port (user-facing) | MEDIUM: Single user affected | Standard |
| Management interface | HIGH: Loses management access to the device | Urgent |
| Loopback | HIGH if used for routing (OSPF RID, BGP update-source) | Urgent |
# Detailed interface statistics
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces TenGigabitEthernet1/0/1"}'
# Check for recent log messages about the interface
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-01"}'
# Check SFP/transceiver status
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show interfaces transceiver"}'
# Check EtherChannel status if applicable
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show etherchannel summary"}'
# Check spanning-tree for blocked ports
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-01","command":"show spanning-tree"}'
| Symptom | CatC Data | CLI Investigation | Likely Cause |
|---|---|---|---|
| admin UP, oper DOWN | Status mismatch in CatC | No link pulse on interface | Bad cable, SFP, remote end shut |
| CRC errors incrementing | Error counters in interface data | show interfaces CRC count | Faulty cable, bad SFP, duplex mismatch |
| Err-disabled | Port not visible or shows errors | show interfaces status err-disabled | Port-security, BPDU guard, storm-control |
| Flapping (up/down cycles) | Multiple status changes | show logging UPDOWN messages | Loose cable, auto-negotiation failure |
| STP blocked | Port up but no traffic | show spanning-tree BLK state | STP topology issue, loop detected |
Trigger: Multiple users at a single site report connectivity loss, or Catalyst Center shows multiple devices unreachable at the same location.
Site-Wide Outage
|
+-- Step 1: Quantify the impact
| How many devices unreachable at this site?
| How many clients affected?
| |
| +-- ALL devices unreachable at site
| | --> WAN link failure, upstream router, or site power outage
| |
| +-- SOME devices unreachable
| | --> Distribution layer or IDF/MDF issue
| |
| +-- Devices reachable but clients can't connect
| --> DHCP, VLAN, wireless controller, or authentication issue
|
+-- Step 2: Identify the failure boundary
| Which devices ARE still reachable?
| What is the common upstream for the failed devices?
|
+-- Step 3: Investigate the upstream device
| Check interfaces, routing, logs on the common upstream
|
+-- Step 4: Resolve and verify
Fix the issue, confirm all devices come back, verify client counts recover
# Get all devices at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"locationName":["Global/USA/NYC/Floor3"]}'
Categorize the results:
# Get time range
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 1 hours"}'
# Count clients at the affected site NOW
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_count '{"start_time":1705312800000,"end_time":1705399200000,"site_hierarchy":["Global/USA/NYC/Floor3"]}'
# Compare with a baseline (e.g., same time window yesterday)
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"yesterday"}'
Impact assessment:
Check the upstream/distribution devices that serve the affected site:
# Check the distribution switch
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"hostname":["DIST-SW-NYC"]}'
# Check its interfaces
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_interfaces '{"device_id":"<DIST-UUID>"}'
If the distribution/core device is still reachable:
# Check all interfaces on the distribution switch
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip interface brief"}'
# Check routing table -- are routes to the affected site present?
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip route"}'
# Check OSPF/BGP adjacencies
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"DIST-SW-NYC","command":"show ip ospf neighbor"}'
# Check for recent events in the logs
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"DIST-SW-NYC"}'
# Ping the unreachable access switches from the distribution
PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"DIST-SW-NYC","command":"ping 10.1.30.1"}'
| Scope | Likely Cause | Key Evidence | Resolution |
|---|---|---|---|
| All devices + all clients | Site power outage | All devices unreachable, no response to pings | Dispatch facilities team, check UPS/PDU |
| All devices + all clients | WAN link failure | Edge router reachable but uplink down | Check ISP, activate backup WAN if available |
| One floor/IDF | Distribution switch failure | Dist switch unreachable, access switches behind it down | Power cycle, console access, hardware swap |
| One floor/IDF | Trunk link failure | Access switches up but no VLAN connectivity | Check trunk port, SFP, cable between access and dist |
| Devices up but no clients | DHCP failure | Clients getting APIPA addresses | Check DHCP server, ip helper-address, DHCP relay |
| Devices up, wireless clients down | WLC issue | APs reachable but no SSID broadcast | Check WLC status, AP join status |
| Devices up, some clients down | VLAN issue | Specific VLAN clients affected | Check VLAN trunking, SVI status |
Trigger: Wireless users report drops when moving between floors or areas, or CatC shows poor wireless health scores.
# Extended time range to capture roaming events
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_api_compatible_time_range '{"time_window":"last 8 hours"}'
# Get detailed client info with wireless views
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_client_details_by_mac '{"client_mac_address":"AA:BB:CC:DD:EE:FF","start_time":1705312800000,"end_time":1705399200000,"view":["Wireless","WirelessHealth"]}'
Look for:
# List all APs at the affected site
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"family":["Unified AP"],"locationName":["Global/USA/NYC/Floor3"]}'
# Count wireless clients per AP to find overloaded APs
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" get_clients_list '{"start_time":1705312800000,"end_time":1705399200000,"client_type":"wireless","site_hierarchy":["Global/USA/NYC/Floor3"]}'
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Client holds onto far AP (sticky client) | 802.11k/v not enabled or client doesn't support it | Enable Optimized Roaming, BSS Transition Management |
| Client drops during roam | 802.11r (FT) not enabled, or PMK caching disabled | Enable Fast Transition (FT), enable CCKM/PMK caching |
| Client roams to 2.4 GHz | Band steering not aggressive enough | Increase band steering threshold, check client capability |
| Dead zone between APs | RF coverage gap | Add AP, increase power (carefully), adjust antenna |
| Client bounces between 2 APs | Equal signal from both APs | Reduce power on one AP, adjust cell boundaries |
Trigger: CatC data seems stale, APIs return unexpected errors, or collectionStatus shows failures across many devices.
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Partial Collection Failure"]}'
CCC_HOST=$CCC_HOST CCC_USER=$CCC_USER CCC_PWD=$CCC_PWD python3 $MCP_CALL "python3 -u $CATC_MCP_SCRIPT" fetch_devices '{"collectionStatus":["Could Not Synchronize"]}'
If client API calls return error 14013 (start time > 30 days ago):
get_api_compatible_time_range first to ensure the time range is validIf client API calls return error 14006 (data not ready for endTime):
get_clients_list, get_client_details_by_mac, and get_clients_count tools automatically retry with the API-suggested adjusted endTimeAlways produce a structured findings report after completing a troubleshooting session:
Troubleshooting Report
=======================
Catalyst Center: $CCC_HOST
Timestamp: YYYY-MM-DD HH:MM UTC
Ticket/Case: [reference number if applicable]
Problem Statement
-----------------
[What was reported, by whom, when it started]
Impact Assessment
-----------------
Devices affected: X unreachable out of Y total at site
Clients affected: ~Z clients (estimated from client count delta)
Sites affected: [list]
Severity: CRITICAL / HIGH / MEDIUM / LOW
Investigation Timeline
-----------------------
1. [HH:MM] Checked CatC device reachability -- found X devices unreachable
2. [HH:MM] Identified site-localized issue at Global/USA/NYC/Floor3
3. [HH:MM] Checked distribution switch DIST-SW-NYC -- reachable
4. [HH:MM] Found TenGig1/0/1 (uplink to Floor3 IDF) down/down
5. [HH:MM] pyATS logs show %LINK-3-UPDOWN at 14:23 UTC
6. [HH:MM] SFP transceiver showing rx power below threshold
Root Cause
----------
Failing SFP transceiver on DIST-SW-NYC TenGigabitEthernet1/0/1
(uplink to Floor3 access layer IDF)
Resolution
----------
[Steps taken to resolve, or escalation path if unresolved]
Verification
-----------
- All Floor3 access switches returned to Reachable in CatC
- Client count at Floor3 recovered to baseline (335 clients)
- No further interface flaps observed in 30-minute monitoring window
Preventive Measures
--------------------
- Schedule optical monitoring for all uplink SFPs
- Add redundant uplink to Floor3 IDF (single point of failure identified)
After completing any troubleshooting session, record the findings and resolution in GAIT:
python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"input":{"role":"assistant","content":"Troubleshooting: Site-wide outage at Global/USA/NYC/Floor3. Root cause: Failing SFP on DIST-SW-NYC Te1/0/1. Impact: 335 clients, 5 access switches unreachable for 47 minutes. Resolution: SFP replaced, services restored. Preventive: Redundant uplink recommended.","artifacts":[]}}'