Troubleshooting
Check node status
Check node status with:
[root@speech-platform ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
speech-platform.localdomain Ready control-plane,master 9s v1.27.6+k3s1
If node is not in ready state, there is usually something wrong.
Note: Node list can be empty (No resources found
) or node can be in notReady
state if virtual appliance is starting up. This is normal and should be fixed in
a few moments.
Also node has to have enough free disk and memory capacity. When this is not true, pressure events are emitted. Run following command to see the node conditions:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletReady kubelet is posting ready status
Disk pressure
Disk pressure node event is emitted, when kubernetes is running out of disk
capacity in the /var
filesystem. Node conditions looks like this:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletReady kubelet is posting ready status
Follow the procedure for extending the disks.
Memory pressure
Memory pressure node event is emitted, when kubernetes is running out of free memory. Node conditions looks like this:
[root@speech-platform disks]# kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure True Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:50:50 +0000 KubeletHasInsufficientMemory kubelet has insufficient memory available
DiskPressure False Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:33:08 +0000 KubeletReady kubelet is posting ready status
You need to grant more memory to the virtual appliance
View pod logs
Logs are stored in /data/log/pods/
or in /data/logs/containers
. You can view
them via filebrowser if needed.
Alternatively you can display logs with kubectl
command:
[root@speech-platform ~]# kubectl -n speech-platform logs -f voiceprint-extraction-7867578b97-w7bzd
[2024-04-29 08:59:10.250] [Configuration] [info] model: /models/xl-5.0.0.model
[2024-04-29 08:59:10.250] [Configuration] [info] port: 8080
[2024-04-29 08:59:10.250] [Configuration] [info] device: cpu
[2024-04-29 08:59:10.250] [critical] base64_decode: invalid character ''<''
Disable DNS resolving for specific domains
Check coreDNS logs at first:
kubectl -n kube-system logs -l k8s-app=kube-dns
Following lines in the logs indicate this issue:
2024-06-05T11:00:49.55751974Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:60352->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.546562499Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:40254->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.548101103Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:47838->192.168.137.1:53: i/o timeout
2024-06-05T11:00:51.558720939Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:39526->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.547326187Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:58487->192.168.137.1:53: i/o timeout
2024-06-05T11:00:53.548836432Z stdout F [ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:46303->192.168.137.1:53: i/o timeout
This happens when DHCP is used for IP address assignment for the virtual
appliance which usually configures nameserver and search domains in
/etc/resolv.conf
:
nameserver 192.168.137.1
search localdomain
Communication within virtual appliance does not use FQDN, which means that each
DNS name is resolved with all domains. Internal kubernetes domains
(<namespace>.svc.cluster.local
, svc.cluster.local
and cluster.local
) are
resolved immediately with coreDNS, non-kubernetes domains are resolved with
nameserver provided by DHCP. If access to the nameserver is blocked (for
example, by firewall), then resolving of single name can take up to 10 seconds,
which can significantly increase task processing duration.
To avoid this issue, you can either allow communication from virtual appliance to DHCP-configured DNS server or configure kubernetes resolver to skip lookup for DHCP-provided domain(s):
- [Virtual appliance] Create file
/data/speech-platform/coredns-custom.yaml
manually with following content. Replace<domain1.com>
and<domain2.com>
for domain you want to disable lookup for:apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
<domain1.com>:53 {
log
}
<domain2.com>:53 {
log
} - [Virtual appliance] File looks like:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
locadomain:53 {
log
}
example.com:53 {
log
} - [Virtual appliance] Restart coreDNS to apply the change:
kubectl -n kube-system rollout restart deploy/coredns
- [Virtual appliance] Check that coreDNS pod is running:
kubectl -n kube-system get pods -l k8s-app=kube-dns
Diagnostics report tool
Diagnostics script is part of virtual appliance. The script is designed to gather system information and application information for troubleshooting.
The script collects following information:
- CPU, RAM and disk usage.
- System logs, application logs and event history.
- Information about kubernetes objects.
Create the diagnostics report
- Connect to the virtual appliance :
$ ssh root@<virtual-appliance-ip>
- Run diagnostics script:
$ /root/run-diag-report.sh
- Script gathers all the information and store them in the zip archive. This
file is stored in the
/data/reports
directory.