Differences

This shows you the differences between two versions of the page.

--- docker_advanced_k8s_troubleshooting [2020/05/22 14:42] – created andonovj
+++ docker_advanced_k8s_troubleshooting [2020/05/25 12:47] (current) – [Not installed NFS Client (NFS Dynamic Provisioning)] andonovj
@@ Line 1: / Line 1: @@
-TODO
+=====Overview=====
+Eventhough Kuberentes is pretty easy to configure, it is prone to problems, especailly if you do it on vagrant. So let's check which tools we can use, to monitor it:
+Firstly we can get a component using the get command:
+<Code:bash|Get a component>
+ubuntu@k8s-master:~$ kubectl get pods
+NAME                                     READY   STATUS             RESTARTS   AGE
+busybox                                  0/1     Pending            0          13m
+first-pod                                1/1     Running            7          21h
+hello-deploy-7f44bd8b96-2xz8j            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-4c76j            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-7tvcs            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-9lnrm            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-dckq2            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-gnvwr            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-p66g8            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-qtxgk            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-qz6cr            1/1     Running            2          20d
+hello-deploy-7f44bd8b96-r7g4q            1/1     Running            2          20d
+nfs-client-provisioner-98cdf7875-26nbg   0/1     CrashLoopBackOff   7          18m
+nfs-client-provisioner-98cdf7875-kdcvv   0/1     CrashLoopBackOff   7          18m
+</Code>
+As you can see, we have 2 failing pods, but we don't know why. This is where the "describe" command comes to place:
+<Code:bash|Describe a resource>
+ubuntu@k8s-master:~$ kubectl describe pod nfs-client-provisioner-98cdf7875-26nbg
+Name:         nfs-client-provisioner-98cdf7875-26nbg
+Namespace:    default
+Priority:     0
+Node:         node-2/10.0.2.15
+Start Time:   Sat, 23 May 2020 13:04:31 +0000
+Labels:       app=nfs-client-provisioner
+              pod-template-hash=98cdf7875
+Annotations:  cni.projectcalico.org/podIP: 192.168.247.1/32
+Status:       Running
+IP:           192.168.247.1
+IPs:
+  IP:           192.168.247.1
+Controlled By:  ReplicaSet/nfs-client-provisioner-98cdf7875
+Containers:
+  nfs-client-provisioner:
+    Container ID:   docker://2e4c95d43caaef0bf2aae6400fe3eb349b8452501b04c8da494052843667e1d6
+    Image:          quay.io/external_storage/nfs-client-provisioner:latest
+    Image ID:       docker-pullable://quay.io/external_storage/nfs-client-provisioner@sha256:022ea0b0d69834b652a4c53655d78642ae23f0324309097be874fb58d09d2919
+    Port:           <none>
+    Host Port:      <none>
+    State:          Waiting
+      Reason:       CrashLoopBackOff
+    Last State:     Terminated
+      Reason:       Error
+      Exit Code:    255
+      Started:      Sat, 23 May 2020 13:19:21 +0000
+      Finished:     Sat, 23 May 2020 13:19:51 +0000
+    Ready:          False
+    Restart Count:  7
+    Environment:
+      PROVISIONER_NAME:  example.com/nfs
+      NFS_SERVER:        192.168.50.10
+      NFS_PATH:          /srv/nfs/kubedata
+    Mounts:
+      /persistentvolumes from nfs-client-root (rw)
+      /var/run/secrets/kubernetes.io/serviceaccount from nfs-client-provisioner-token-ldqw7 (ro)
+Conditions:
+  Type              Status
+  Initialized       True
+  Ready             False
+  ContainersReady   False
+  PodScheduled      True
+Volumes:
+  nfs-client-root:
+    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
+    Server:    192.168.50.10
+    Path:      /srv/nfs/kubedata
+    ReadOnly:  false
+  nfs-client-provisioner-token-ldqw7:
+    Type:        Secret (a volume populated by a Secret)
+    SecretName:  nfs-client-provisioner-token-ldqw7
+    Optional:    false
+QoS Class:       BestEffort
+Node-Selectors:  <none>
+Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
+                 node.kubernetes.io/unreachable:NoExecute for 300s
+Events:
+  Type     Reason     Age                   From               Message
+  ----     ------     ----                  ----               -------
+  Normal   Scheduled  18m                   default-scheduler  Successfully assigned default/nfs-client-provisioner-98cdf7875-26nbg to node-2
+  Normal   Created    16m (x4 over 18m)     kubelet, node-2    Created container nfs-client-provisioner
+  Normal   Started    16m (x4 over 18m)     kubelet, node-2    Started container nfs-client-provisioner
+  Normal   Pulling    15m (x5 over 18m)     kubelet, node-2    Pulling image "quay.io/external_storage/nfs-client-provisioner:latest"
+  Normal   Pulled     14m (x5 over 18m)     kubelet, node-2    Successfully pulled image "quay.io/external_storage/nfs-client-provisioner:latest"
+  Warning  BackOff    3m19s (x52 over 17m)  kubelet, node-2    Back-off restarting failed container
+ubuntu@k8s-master:~$
+</Code>
+But even that, doesn't show us so much. So what, we can do. Well we can troubleshoot :)
+Firstly, let's check the logs:
+<Code:bash|Check logs>
+ubuntu@k8s-master:~$ kubectl logs nfs-client-provisioner-98cdf7875-26nbg
+Error from server (NotFound): the server could not find the requested resource ( pods/log nfs-client-provisioner-98cdf7875-26nbg)
+ubuntu@k8s-master:~$
+</Code>
+Well, that is specific issue to Vagrant configuration.
+The problem here is the fact that, the vagrant is using 10.0.2.15 IP as default IP, well that isn't the IP which you want for your kubernetes' API. So, the solution for that is to, edit the: /etc/systemd/system/kubelet.service.d/10-kubeadm.conf, by adding the following:
+<Code:bash|Edit Kubelete config>
+Environment="KUBELET_EXTRA_ARGS=--node-ip=VAGRANT_VM_EXTERNAL_IP_HERE"
+</Code>
+After that, we have to restart the kubelete using one of the following commands, (guess which one)
+<Code:bash|Kubelete Commands>
+systemctl stop kubelet
+systemctl start kubelet
+systemctl restart kubelet
+systemctl status kubelet
+</Code>
+But to be honest, better restart the whole Kubernetes :)
+After that all should be running under the correct IPs:
+<Code:bash|Check Kubernetes>
+ubuntu@k8s-master:~$ kubectl get pods -o wide --all-namespaces
+NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE         NOMINATED NODE   READINESS GATES
+kube-system   calico-kube-controllers-77c5fc8d7f-djzvd   1/1     Running   0          6m15s   192.168.235.195   k8s-master   <none>           <none>
+kube-system   calico-node-f6nvk                          1/1     Running   0          6m15s   192.168.50.10     k8s-master   <none>           <none>
+kube-system   calico-node-km6rz                          1/1     Running   0          3m53s   192.168.50.11     node-1       <none>           <none>
+kube-system   calico-node-wgq4v                          1/1     Running   0          2m49s   192.168.50.12     node-2       <none>           <none>
+kube-system   coredns-66bff467f8-5mntv                   1/1     Running   0          9m34s   192.168.235.194   k8s-master   <none>           <none>
+kube-system   coredns-66bff467f8-6ks2w                   1/1     Running   0          9m34s   192.168.235.193   k8s-master   <none>           <none>
+kube-system   etcd-k8s-master                            1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
+kube-system   kube-apiserver-k8s-master                  1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
+kube-system   kube-controller-manager-k8s-master         1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
+kube-system   kube-proxy-cbrt9                           1/1     Running   0          2m49s   192.168.50.12     node-2       <none>           <none>
+kube-system   kube-proxy-lmn4d                           1/1     Running   0          3m53s   192.168.50.11     node-1       <none>           <none>
+kube-system   kube-proxy-wfz74                           1/1     Running   0          9m34s   192.168.50.10     k8s-master   <none>           <none>
+kube-system   kube-scheduler-k8s-master                  1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
+ubuntu@k8s-master:~$
+ubuntu@k8s-master:~$ kubectl logs calico-node-km6rz --namespace kube-system
+-05-25 07:49:24.728 [INFO][9] startup.go 256: Early log level set to info
+-05-25 07:49:24.728 [INFO][9] startup.go 272: Using NODENAME environment for node name
+-05-25 07:49:24.728 [INFO][9] startup.go 284: Determined node name: node-1
+-05-25 07:49:24.729 [INFO][9] k8s.go 228: Using Calico IPAM
+-05-25 07:49:24.729 [INFO][9] startup.go 316: Checking datastore connection
+-05-25 07:49:24.737 [INFO][9] startup.go 340: Datastore connection verified
+-05-25 07:49:24.737 [INFO][9] startup.go 95: Datastore is ready
+-05-25 07:49:24.744 [INFO][9] startup.go 382: Initialize BGP data
+-05-25 07:49:24.745 [INFO][9] startup.go 584: Using autodetected IPv4 address on interface enp0s8: 192.168.50.11/24
+</Code>
+=====Specific Problems=====
+Let's dwelves into more specific problems:
+====Wrong Api Version====
+If you have a wrong Api version in teh YML file, you will receive the following error:
+<Code:bash|Wrong API Version>
+ubuntu@k8s-master:~/external-storage/nfs/deploy/kubernetes$ kubectl create -f nginx.yml
+error: unable to recognize "nginx.yml": no matches for kind "Deployment" in version "extensions/v1beta1"
+</Code>
+To fix that, find where that image has been moved, most probably for newer versions of Kubernetes is:
+<Code:bash|Correct API Version for nginx>
+ubuntu@k8s-master:~/external-storage/nfs/deploy/kubernetes$ cat nginx.yml
+apiVersion: apps/v1
+</Code>
+Bare in mind that can of course change.
+====Not installed NFS Client (NFS Dynamic Provisioning)====
+In case you have NFS dynamic provisioning and you didn't install NFS client on ALL nodes of the cluster, you will get the following error, when you try to create a POD which is to be placed there.
+<Code:bash|Error with NFS client>
+Mounting command: systemd-run
+Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/64003b39-2316-4254-9e62-cb6d09f3fd6e/volumes/kubernetes.io~nfs/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd --scope -- mount -t nfs -o vers=4.1 10.111.172.167:/export/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd /var/lib/kubelet/pods/64003b39-2316-4254-9e62-cb6d09f3fd6e/volumes/kubernetes.io~nfs/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd
+Output: Running scope as unit run-r94910a4067104f079e9aae302b111ab2.scope.
+mount: wrong fs type, bad option, bad superblock on 10.111.172.167:/export/pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd,
+       missing codepage or helper program, or other error
+       (for several filesystems (e.g. nfs, cifs) you might
+       need a /sbin/mount.<type> helper program)
+       In some cases useful info is found in syslog - try
+       dmesg | tail or so.
+  Warning  FailedMount  42s  kubelet, node-1  MountVolume.SetUp failed for volume "pvc-9a8aa090-7c73-4e64-94eb-dcc7805828dd" : mount failed: exit status 32
+</Code>
+To fix that, just install the NFS client on all the nodes:
+<Code:bash|Instgall NFS Client on Ubuntu>
+root@node-1:~# apt-get install nfs-common
+Reading package lists... Done
+Building dependency tree
+Reading state information... Done
+The following additional packages will be installed:
+  keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 python python-minimal python2.7 python2.7-minimal rpcbind
+Suggested packages:
+  watchdog python-doc python-tk python2.7-doc binfmt-support
+The following NEW packages will be installed:
+  keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 nfs-common python python-minimal python2.7 python2.7-minimal rpcbind
+upgraded, 12 newly installed, 0 to remove and 40 not upgraded.
+Need to get 4,258 kB of archives.
+After this operation, 18.0 MB of additional disk space will be used.
+Do you want to continue? [Y/n] Y
+</Code>
+<Code:bash|Instgall NFS Client on Centos>
+root@node-1:~# yum install -y nfs-utils
+Reading package lists... Done
+Building dependency tree
+Reading state information... Done
+The following additional packages will be installed:
+  keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 python python-minimal python2.7 python2.7-minimal rpcbind
+Suggested packages:
+  watchdog python-doc python-tk python2.7-doc binfmt-support
+The following NEW packages will be installed:
+  keyutils libnfsidmap2 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libtirpc1 nfs-common python python-minimal python2.7 python2.7-minimal rpcbind
+upgraded, 12 newly installed, 0 to remove and 40 not upgraded.
+Need to get 4,258 kB of archives.
+After this operation, 18.0 MB of additional disk space will be used.
+Do you want to continue? [Y/n] Y
+</Code>