Eventhough Kuberentes is pretty easy to configure, it is prone to problems, especailly if you do it on vagrant. So let's check which tools we can use, to monitor it:
Firstly we can get a component using the get command:
Get a component
ubuntu@k8s-master:~$ kubectl get pods NAME READY STATUS RESTARTS AGE busybox 0/1 Pending 0 13m first-pod 1/1 Running 7 21h hello-deploy-7f44bd8b96-2xz8j 1/1 Running 2 20d hello-deploy-7f44bd8b96-4c76j 1/1 Running 2 20d hello-deploy-7f44bd8b96-7tvcs 1/1 Running 2 20d hello-deploy-7f44bd8b96-9lnrm 1/1 Running 2 20d hello-deploy-7f44bd8b96-dckq2 1/1 Running 2 20d hello-deploy-7f44bd8b96-gnvwr 1/1 Running 2 20d hello-deploy-7f44bd8b96-p66g8 1/1 Running 2 20d hello-deploy-7f44bd8b96-qtxgk 1/1 Running 2 20d hello-deploy-7f44bd8b96-qz6cr 1/1 Running 2 20d hello-deploy-7f44bd8b96-r7g4q 1/1 Running 2 20d nfs-client-provisioner-98cdf7875-26nbg 0/1 CrashLoopBackOff 7 18m nfs-client-provisioner-98cdf7875-kdcvv 0/1 CrashLoopBackOff 7 18m
As you can see, we have 2 failing pods, but we don't know why. This is where the “describe” command comes to place:
Describe a resource
ubuntu@k8s-master:~$ kubectl describe pod nfs-client-provisioner-98cdf7875-26nbg Name: nfs-client-provisioner-98cdf7875-26nbg Namespace: default Priority: 0 Node: node-2/10.0.2.15 Start Time: Sat, 23 May 2020 13:04:31 +0000 Labels: app=nfs-client-provisioner pod-template-hash=98cdf7875 Annotations: cni.projectcalico.org/podIP: 192.168.247.1/32 Status: Running IP: 192.168.247.1 IPs: IP: 192.168.247.1 Controlled By: ReplicaSet/nfs-client-provisioner-98cdf7875 Containers: nfs-client-provisioner: Container ID: docker://2e4c95d43caaef0bf2aae6400fe3eb349b8452501b04c8da494052843667e1d6 Image: quay.io/external_storage/nfs-client-provisioner:latest Image ID: docker-pullable://quay.io/external_storage/nfs-client-provisioner@sha256:022ea0b0d69834b652a4c53655d78642ae23f0324309097be874fb58d09d2919 Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 255 Started: Sat, 23 May 2020 13:19:21 +0000 Finished: Sat, 23 May 2020 13:19:51 +0000 Ready: False Restart Count: 7 Environment: PROVISIONER_NAME: example.com/nfs NFS_SERVER: 192.168.50.10 NFS_PATH: /srv/nfs/kubedata Mounts: /persistentvolumes from nfs-client-root (rw) /var/run/secrets/kubernetes.io/serviceaccount from nfs-client-provisioner-token-ldqw7 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: nfs-client-root: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 192.168.50.10 Path: /srv/nfs/kubedata ReadOnly: false nfs-client-provisioner-token-ldqw7: Type: Secret (a volume populated by a Secret) SecretName: nfs-client-provisioner-token-ldqw7 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 18m default-scheduler Successfully assigned default/nfs-client-provisioner-98cdf7875-26nbg to node-2 Normal Created 16m (x4 over 18m) kubelet, node-2 Created container nfs-client-provisioner Normal Started 16m (x4 over 18m) kubelet, node-2 Started container nfs-client-provisioner Normal Pulling 15m (x5 over 18m) kubelet, node-2 Pulling image "quay.io/external_storage/nfs-client-provisioner:latest" Normal Pulled 14m (x5 over 18m) kubelet, node-2 Successfully pulled image "quay.io/external_storage/nfs-client-provisioner:latest" Warning BackOff 3m19s (x52 over 17m) kubelet, node-2 Back-off restarting failed container ubuntu@k8s-master:~$
But even that, doesn't show us so much. So what, we can do. Well we can troubleshoot :)
Firstly, let's check the logs:
Check logs
ubuntu@k8s-master:~$ kubectl logs nfs-client-provisioner-98cdf7875-26nbg Error from server (NotFound): the server could not find the requested resource ( pods/log nfs-client-provisioner-98cdf7875-26nbg) ubuntu@k8s-master:~$
Well, that is specific issue to Vagrant configuration. The problem here is the fact that, the vagrant is using 10.0.2.15 IP as default IP, well that isn't the IP which you want for your kubernetes' API. So, the solution for that is to, edit the: /etc/systemd/system/kubelet.service.d/10-kubeadm.conf, by adding the following:
Edit Kubelete config
Environment="KUBELET_EXTRA_ARGS=--node-ip=VAGRANT_VM_EXTERNAL_IP_HERE"
After that, we have to restart the kubelete using one of the following commands, (guess which one)
Kubelete Commands
systemctl stop kubelet systemctl start kubelet systemctl restart kubelet systemctl status kubelet
But to be honest, better restart the whole Kubernetes :) After that all should be running under the correct IPs:
Check Kubernetes
ubuntu@k8s-master:~$ kubectl get pods -o wide --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-77c5fc8d7f-djzvd 1/1 Running 0 6m15s 192.168.235.195 k8s-master <none> <none> kube-system calico-node-f6nvk 1/1 Running 0 6m15s 192.168.50.10 k8s-master <none> <none> kube-system calico-node-km6rz 1/1 Running 0 3m53s 192.168.50.11 node-1 <none> <none> kube-system calico-node-wgq4v 1/1 Running 0 2m49s 192.168.50.12 node-2 <none> <none> kube-system coredns-66bff467f8-5mntv 1/1 Running 0 9m34s 192.168.235.194 k8s-master <none> <none> kube-system coredns-66bff467f8-6ks2w 1/1 Running 0 9m34s 192.168.235.193 k8s-master <none> <none> kube-system etcd-k8s-master 1/1 Running 0 9m44s 192.168.50.10 k8s-master <none> <none> kube-system kube-apiserver-k8s-master 1/1 Running 0 9m44s 192.168.50.10 k8s-master <none> <none> kube-system kube-controller-manager-k8s-master 1/1 Running 0 9m44s 192.168.50.10 k8s-master <none> <none> kube-system kube-proxy-cbrt9 1/1 Running 0 2m49s 192.168.50.12 node-2 <none> <none> kube-system kube-proxy-lmn4d 1/1 Running 0 3m53s 192.168.50.11 node-1 <none> <none> kube-system kube-proxy-wfz74 1/1 Running 0 9m34s 192.168.50.10 k8s-master <none> <none> kube-system kube-scheduler-k8s-master 1/1 Running 0 9m44s 192.168.50.10 k8s-master <none> <none> ubuntu@k8s-master:~$ ubuntu@k8s-master:~$ kubectl logs calico-node-km6rz --namespace kube-system 2020-05-25 07:49:24.728 [INFO][9] startup.go 256: Early log level set to info 2020-05-25 07:49:24.728 [INFO][9] startup.go 272: Using NODENAME environment for node name 2020-05-25 07:49:24.728 [INFO][9] startup.go 284: Determined node name: node-1 2020-05-25 07:49:24.729 [INFO][9] k8s.go 228: Using Calico IPAM 2020-05-25 07:49:24.729 [INFO][9] startup.go 316: Checking datastore connection 2020-05-25 07:49:24.737 [INFO][9] startup.go 340: Datastore connection verified 2020-05-25 07:49:24.737 [INFO][9] startup.go 95: Datastore is ready 2020-05-25 07:49:24.744 [INFO][9] startup.go 382: Initialize BGP data 2020-05-25 07:49:24.745 [INFO][9] startup.go 584: Using autodetected IPv4 address on interface enp0s8: 192.168.50.11/24
Let's dwelves into more specific problems:
If you have a wrong Api version in teh YML file, you will receive the following error:
Wrong API Version
ubuntu@k8s-master:~/external-storage/nfs/deploy/kubernetes$ kubectl create -f nginx.yml error: unable to recognize "nginx.yml": no matches for kind "Deployment" in version "extensions/v1beta1"
To fix that, find where that image has been moved, most probably for newer versions of Kubernetes is:
Correct API Version for nginx
ubuntu@k8s-master:~/external-storage/nfs/deploy/kubernetes$ cat nginx.yml apiVersion: apps/v1
Bare in mind that can of course change.
In case you have NFS dynamic provisioning and you didn't install NFS client on ALL nodes of the cluster, you will get the following error, when you try to create a POD which is to be placed there.
Error with NFS client
Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/64003b39-2316-4254-9e62-cb6d09f3fd6e/volumes/kubernetes.io~nfs/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd --scope -- mount -t nfs -o vers=4.1 10.111.172.167:/export/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd /var/lib/kubelet/pods/64003b39-2316-4254-9e62-cb6d09f3fd6e/volumes/kubernetes.io~nfs/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd Output: Running scope as unit run-r94910a4067104f079e9aae302b111ab2.scope. mount: wrong fs type, bad option, bad superblock on 10.111.172.167:/export/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program) In some cases useful info is found in syslog - try dmesg | tail or so. Warning FailedMount 42s kubelet, node-1 MountVolume.SetUp failed for volume "pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd" : mount failed: exit status 32
To fix that, just install the NFS client on all the nodes:
Instgall NFS Client on Ubuntu
root@node-1:~# apt-get install nfs-common Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 python python-minimal python2.7 python2.7-minimal rpcbind Suggested packages: watchdog python-doc python-tk python2.7-doc binfmt-support The following NEW packages will be installed: keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 nfs-common python python-minimal python2.7 python2.7-minimal rpcbind 0 upgraded, 12 newly installed, 0 to remove and 40 not upgraded. Need to get 4,258 kB of archives. After this operation, 18.0 MB of additional disk space will be used. Do you want to continue? [Y/n] Y
Instgall NFS Client on Centos
root@node-1:~# yum install -y nfs-utils Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 python python-minimal python2.7 python2.7-minimal rpcbind Suggested packages: watchdog python-doc python-tk python2.7-doc binfmt-support The following NEW packages will be installed: keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 nfs-common python python-minimal python2.7 python2.7-minimal rpcbind 0 upgraded, 12 newly installed, 0 to remove and 40 not upgraded. Need to get 4,258 kB of archives. After this operation, 18.0 MB of additional disk space will be used. Do you want to continue? [Y/n] Y