Troubleshoot Terway CNI issues in Kubernetes using Kubernetes events and Terway logs. Use when diagnosing "cni plugin not initialized", Pod create/delete failures, or ENI/IPAM problems in Terway (centralized or non-centralized IPAM).
Use this Skill whenever the user:
Always assume the cluster is running Kubernetes and Terway is the CNI plugin.
Follow this order to diagnose Terway issues efficiently:
Verify Terway Component Health
Runningkubectl get pods -n kube-system -l app=terway-eniip -o wideRunning, check its events and logs (terway-init and terway containers) using the patterns in Step 1.Gather Necessary Context (As needed)
kubectl commands directly:
./scripts/inspect-terway-cluster.sh./scripts/inspect-terway-node.sh <node-name>./scripts/inspect-terway-pod.sh <namespace> <pod-name>Use Kubernetes Events as the primary signal
kubectl describe pod <pod> -n <ns>.AllocIPFailed, CniPodCreateError) to likely causes.Inspect Terway IPAM / ENI controllers
crd vs default), check relevant CRDs (PodENI, Node) and their Events.Deep Dive into Logs
Keep answers structured: first restate what has been checked, then propose next verification steps.
If the user reports "cni plugin not initialized" or "dial unix ... eni.socket: no such file or directory":
kubectl get pods -n kube-system -l app=terway-eniip -o wide.Diagnose by Pod Status and Log Patterns:
If the Pod is in Init:Error: Inspect terway-init logs (kubectl logs <pod> -n kube-system -c terway-init).
exclusive eni mode changed:
k8s.aliyun.com/exclusive-mode-eni-type was modified on an existing node. Exclusive mode only works for newly created nodes.get node ... error:
unsupport kernel version, require >=5.10:
failed process input:
/etc/eni/eni_conf (from eni-config CM) is missing or invalid JSON.mount failed:
bpffs on /sys/fs/bpf (privilege or kernel support issue).Init erdma driver failure:
modprobe erdma failed on an ERDMA-enabled node.If the Pod is in CrashLoopBackOff or Error: Inspect main container logs.
terway) Patterns:
error restart device plugin after kubelet restart: Check permissions/mounts for /var/lib/kubelet/device-plugins.unable to set feature gates: Invalid flag in --feature-gates.error create trunk eni: OpenAPI failure during trunk ENI initialization (check Aliyun credentials/quota).terway-controlplane) Patterns:
failed to create controller: Check RBAC permissions or CRD availability.failed to setup webhooks: TLS certificate or WebhookConfiguration issues.If the Pod is Running but the socket error persists:
var/run/eni/ directory is correctly shared between the host and the container via hostPath volume.Only after Terway is confirmed running on the node, proceed to Pod create/delete failures and Events.
For any Pod with network-related failures:
Inspect Pod Events
kubectl describe pod <pod> -n <ns> and paste relevant Events.AllocIPFailed (Warning, Pod)AllocIPSucceed (Normal, Pod)VirtualModeChanged (Warning, Pod)CniPodCreateError (Warning, Pod)CniPodDeleteError (Warning, Pod)CniCreateENIError (Warning, Pod)CniPodENIDeleteErr (Warning, Pod)Interpret common Pod event reasons
AllocIPFailed (Warning, Pod)
cmdAdd: error alloc ip: Backend communication failure (daemon to controlplane or daemon internal).eth0 config is missing: Backend failed to return configuration for the primary interface.InvalidVSwitchID.IPNotEnough / QuotaExceeded.PrivateIPAddress: VSwitch IP exhaustion.ErrEniPerInstanceLimitExceeded: Node-level ENI quota reached.AllocIPSucceed (Normal, Pod)
Alloc IP %s took %s.VirtualModeChanged (Warning, Pod)
IPVLan seems unavailable, use Veth instead.CniPodCreateError (Warning, Pod)
error parse pod annotation: k8s.aliyun.com/pod-networks is malformed.podNetworking is empty: k8s.aliyun.com/pod-networking annotation is present but empty.error get podNetworking %s: The referenced PodNetworking CR is missing.can not found available vSwitch for zone %s: No available VSwitch in the current zone matching the selector.CniPodDeleteError (Warning, Pod)
CniCreateENIError / CniPodENIDeleteErr (Warning, Pod)
rollbackErr.If no Terway-specific Events are present
Distinguish between:
corev1.Node).network.alibabacloud.com/v1beta1 Node) used in centralized IPAM.On the Kubernetes Node (corev1.Node)
AllocIPFailed (Warning, Node)
ConfigError (Warning, Node)
eni-config or node capabilities are invalid.On the Terway Node CRD (centralized IPAM)
Node CR under network.alibabacloud.com exists.CreateENIFailed: Message: Failed to create ENI type=%s vsw=%s: %v. Check for OpenAPI errors like InvalidVSwitchID.IPNotEnough.AttachENIFailed: Message: trunk eni id not found (agent not ready) or trunk eni is not allowed for eniOnly pod (scheduling/config mismatch).DeleteENIFailed: Message: Failed to delete ENI %s: %v.SufficientIP: If False, reason is IPResInsufficient, meaning the node pool cannot be filled.Link Node events to Pod failures
AllocIPFailed or CniPodCreateError, check whether the corresponding Node / Node CR shows ENI/IPAM failures.When reasoning about Terway behavior, always clarify which IPAM mode is in use.
Detect mode from context
podenis.network.alibabacloud.com, nodes.network.alibabacloud.com, podnetworkings.network.alibabacloud.com.centralizedIPAM: true or controlplane config with CentralizedIPAM set.eni-config is default.If centralized IPAM
CreateENIFailed, AttachENIFailed, UpdatePodENIFailed.SyncPodNetworkingSucceed/Failed when syncing vswitch lists.If non-centralized IPAM
AllocIPFailed, ConfigError).eni-config ConfigMap correctness (vswitches, security groups, ip_stack, trunk/erdma flags, etc.).When to move to logs
AllocIPFailed without OpenAPI error details).Which logs to inspect
How to combine logs with Events
Before starting troubleshooting, gather cluster-wide Terway configuration:
./scripts/inspect-terway-cluster.sh
This script inspects:
terway-eniip DaemonSet image tagack-cluster-profile ConfigMapkube-proxy-worker ConfigMapcrd for centralized, default for non-centralized) from eni-config ConfigMapenable_eni_trunking, enable_erdma, vswitch_selection_policy, max_pool_size, min_pool_size, etc.Use this information to determine whether centralized IPAM is enabled and which Terway features are active. This guides the rest of the troubleshooting flow.
To inspect Terway-related node configuration for a problematic Pod, first identify the Pod's node (for example via kubectl get pod -o wide). Then, from the repository root, run:
./scripts/inspect-terway-node.sh <node-name>
This prints ENI mode (shared vs exclusive), node-level dynamic config (terway-config), LingJun node flags, k8s.aliyun.com/ignore-by-terway and k8s.aliyun.com/no-kube-proxy labels, and the ENO API type from the nodes.network.alibabacloud.com CR. Use this information as input to the troubleshooting steps above when you have located the Pod's node.
To inspect Terway-related Pod configuration, run:
./scripts/inspect-terway-pod.sh <namespace> <pod-name>
This checks:
hostNetwork (if true, Terway CNI does not process it).k8s.aliyun.com/pod-eni: "true" annotation (indicating trunk/exclusive ENI mode).k8s.aliyun.com/pod-networks (explicit pod-networks config)k8s.aliyun.com/pod-networks-request (pod-networks-request config)k8s.aliyun.com/pod-networking (matched PodNetworking resource)eni-config default on eth0 if none of the above are set.Use this to determine if the Pod should be managed by Terway, whether it uses PodENI, and which configuration source drives its ENI/IP allocation.
When this Skill is active:
AllocIPFailed, CniPodCreateError, CreateENIFailed) and explain what they mean.kubectl describe pod output for the problematic Pod.