Troubleshooting
Check node status
Check node status with:
kubectl get nodes
Expected output when the node is healthy:
NAME STATUS ROLES AGE VERSION
speech-platform.localdomain Ready control-plane,master 9s v1.30.5+k3s1
Node list can be empty (No resources found) or node can be in NotReady state
if the virtual appliance is starting up. This is normal and should resolve
within a few moments.
The node also needs enough free disk and memory capacity. When resources are insufficient, pressure events are emitted. Run the following command to see node conditions:
kubectl describe node | grep -A 6 Conditions:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 07:46:39 +0000 KubeletReady kubelet is posting ready status
Disk pressure
Disk pressure node event is emitted when Kubernetes is running out of disk
capacity in the /var filesystem:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
DiskPressure True Mon, 29 Apr 2024 08:13:54 +0000 Mon, 29 Apr 2024 08:06:45 +0000 KubeletHasDiskPressure kubelet has disk pressure
Follow the procedure for extending the disks.
Memory pressure
Memory pressure node event is emitted when Kubernetes is running out of free memory:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure True Mon, 29 Apr 2024 08:50:50 +0000 Mon, 29 Apr 2024 08:50:50 +0000 KubeletHasInsufficientMemory kubelet has insufficient memory available
You need to grant more memory to the virtual appliance.
View pod logs
Logs are stored in /data/log/pods/ or in /data/logs/containers. You can view
them via Filebrowser if needed.
Alternatively you can display logs with kubectl command:
kubectl -n speech-platform logs -f voiceprint-extraction-7867578b97-w7bzd
[2024-04-29 08:59:10.250] [Configuration] [info] model: /models/xl-5.0.0.model
[2024-04-29 08:59:10.250] [Configuration] [info] port: 8080
[2024-04-29 08:59:10.250] [Configuration] [info] device: cpu
[2024-04-29 08:59:10.250] [critical] base64_decode: invalid character ''<''
Changes in configuration are not applied
Use this when you have made changes to
/data/speech-platform/speech-platform-values.yaml but they do not seem to take
effect (e.g., new settings aren't reflected in the application, services don't
start properly, etc.).
Why this happens: The Helm controller automatically watches for changes in the config file. If the YAML configuration file is invalid, the update job fails and the system continues running the old config or fails to deploy completely.
How to troubleshoot: If the configuration is incorrect, the update job will not complete successfully, and the underlying pod will either restart or be in an error state. The pod status will reflect this issue.
Step 1. Check the Helm install job status:
kubectl get pods -n kube-system | grep -i helm-install
helm-install-filebrowser-2b7pn 0/1 Completed 0 51m
helm-install-ingress-nginx-m87d4 0/1 Completed 0 51m
helm-install-nginx-nrcvk 0/1 Completed 0 51m
helm-install-dcgm-exporter-fjqzz 0/1 Completed 0 51m
helm-install-kube-prometheus-stack-jn5bz 0/1 Completed 0 51m
helm-install-keda-vsn95 0/1 Completed 0 51m
helm-install-speech-platform-9l9vj 0/1 Error 4 (46s ago) 6m15s
Step 2. Inspect the logs of the failing job:
kubectl logs -f <failing-job-name> -n kube-system
Upgrading speech-platform
+ helm_v3 upgrade --namespace speech-platform speech-platform https://10.43.0.1:443/static/phonexia-charts/speech-platform-0.0.0-36638f5-helm.tgz --values /config/values-10_HelmChartConfig.yaml
Error: failed to parse /config/values-10_HelmChartConfig.yaml: error converting YAML to JSON: yaml: line 494: could not find expected ':'
Step 3. Validate the YAML (see next section).
Check configuration file validity
Whenever changes are made to speech-platform-values.yaml, or if a Helm update
job fails due to YAML syntax issues.
Why this matters: Helm requires a valid YAML configuration file to parse and apply configuration. A missing colon, incorrect indentation, or misplaced value can break the deployment.
Step 1. Validate the config:
yq .spec.valuesContent /data/speech-platform/speech-platform-values.yaml | yq .
If the configuration file is valid, the content of the file will be printed. Otherwise, the line number with an error will be printed:
Error: bad file '-': yaml: line 253: could not find expected ':'
The actual configuration is nested under spec.valuesContent, usually starting
on line 7. If you see an error on line 253, add 7 (253 + 7 = 260) to get the
actual line in the file.
Step 2. View the lines around the error:
cat -n /data/speech-platform/speech-platform-values.yaml | grep 260 -B 10 -A 10
250
251 model:
252 volume:
253 hostPath:
254 path: /data/models/enhanced_speech_to_text_built_on_whisper
255
256 # Name of a model file inside the volume, for example "large_v2-1.0.0.model"
257 file: "large_v2-1.0.1.model"
258 license:
259 value:
260 "eyJ2ZX...=="
261
262 # Uncomment this to grant access to GPU on whisper pod
263 resources:
Step 3. Fix the error
In the example above, the license value is on a separate line (260) from its key (259). This is invalid YAML — merge the two lines:
value:
"eyJ2ZX...=="
value: "eyJ2ZX...=="
The resulting file should look like this:
258 license:
259 value: "eyJ2ZX...=="
260
261 # Uncomment this to grant access to GPU on whisper pod
262 resources:
263 limits:
264 nvidia.com/gpu: "1"
265
266 # Uncomment this to run whisper on GPU
267 runtimeClassName: "nvidia"
268
269 service:
270 clusterIP: "None"
Disable DNS resolving for specific domains
Use this when you see long response times, timeout errors, or task processing delays due to DNS lookup issues, particularly when using DHCP or custom DNS setups.
Why this happens: This happens when DHCP is used for IP address assignment
for the virtual appliance which usually configures nameserver and search domains
in /etc/resolv.conf:
nameserver 192.168.137.1
search localdomain
Check coreDNS logs first:
kubectl -n kube-system logs -l k8s-app=kube-dns
Following lines in the logs indicate this issue:
[ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:60352->192.168.137.1:53: i/o timeout
[ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:40254->192.168.137.1:53: i/o timeout
[ERROR] plugin/errors: 2 speech-platform-envoy.localdomain. AAAA: read udp 10.42.0.27:47838->192.168.137.1:53: i/o timeout
Communication within the virtual appliance does not use FQDN, which means that
each DNS name is resolved with all domains. Internal Kubernetes domains
(<namespace>.svc.cluster.local, svc.cluster.local and cluster.local) are
resolved immediately with coreDNS, non-Kubernetes domains are resolved with
nameserver provided by DHCP. If access to the nameserver is blocked (for
example, by firewall), then resolving of a single name can take up to 10
seconds, which can significantly increase task processing duration.
How to resolve: To avoid this issue, you can either allow communication from virtual appliance to DHCP-configured DNS server or configure Kubernetes resolver to skip lookup for DHCP-provided domain(s):
Step 1. Create a DNS override file:
Create file /data/speech-platform/coredns-custom.yaml with the following
content. Replace <domain1.com> and <domain2.com> with the domains you want
to disable lookup for:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
custom.server: |
<domain1.com>:53 {
log
}
<domain2.com>:53 {
log
}
Step 2. Restart CoreDNS to apply the change:
kubectl -n kube-system rollout restart deploy/coredns
Step 3. Verify CoreDNS is healthy and the pod is running:
kubectl -n kube-system get pods -l k8s-app=kube-dns
Step 4. Restart all speech-platform pods:
kubectl -n speech-platform rollout restart deploy
kubectl -n speech-platform rollout restart sts
Deployment in an air-gapped environment
If you plan to deploy the virtual appliance in an environment without DHCP or DNS availability, you will need to make certain adjustments.
Detected networking issues
In certain network configurations, the Speech Platform system fails to start and shows detected issues with networking in the welcome screen. Typically, this happens in peer-to-peer, ad-hoc networks without a router (e.g. multiple computers connected just to a switch), with either static IP addresses configured manually, or dynamically assigned by a local DHCP server.
The issue with such network setup is that there is no default gateway defined
— as it is not needed in such setup — but the main Speech Platform k3s service
requires to have a gateway IP address defined.
As a result, when the Speech Platform system does not find a default gateway IP
address assigned to a device, the main k3s service fails to start, thus the
entire Speech Platform system won't start.
In that case it's necessary to configure the network manually, with statically defined IP addresses. Use the following commands to do that.
- First of all, check the name of the network connection — if you see different
name than
Wired connection 1, use the shown connection name instead in the further commandsTerminalnmcli con show - Assign a static IP address (replace
IP_ADDRESSwith the IP address you want the Virtual Appliance to use)Terminalnmcli con mod "Wired connection 1" ipv4.addr IP_ADDRESS/24 - Set the gateway IP address (replace
GATEWAY_ADDRESSwith the gateway IP address — in networks without router it can be any IP, e.g. the IP address of the machine itself)Terminalnmcli con mod "Wired connection 1" ipv4.gateway GATEWAY_ADDRESS - Then configure the connection to use manual IP settings
Terminal
nmcli con mod "Wired connection 1" ipv4.method manual - Deactivate the connection, so that the changes can be applied
Terminal
nmcli con down "Wired connection 1" - Reactivate the connection
Terminal
nmcli con up "Wired connection 1" - Once the connection is active, reset and restart the k3s service:
Terminal
systemctl reset-failed k3s
systemctl start k3s
After completing these steps, Speech Platform should perform its startup sequence and complete it successfully by showing the welcome screen. However, if your network lacks access to upstream DNS, or does not contain DNS at all, further modifications may be needed by following the instructions in the next section.
No Upstream DNS Available
If your environment has a router assigning IP addresses but is isolated from upstream DNS servers, complete the following steps.
The speech-platform will start normally, but all processing tasks will return an error state until this is resolved.
- Create a file named
coredns-config.yamlin the directory/data/speech-platform/ - Insert the following content into the file
/data/speech-platform/coredns-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
cache 30
loop
reload
loadbalance
import /etc/coredns/custom/*.override
}
import /etc/coredns/custom/*.server - Save the file and run:
Terminal
kubectl rollout restart deploy coredns -n kube-system