docker_advanced_k8s_troubleshooting [My wiki]

This is an old revision of the document!

Eventhough Kuberentes is pretty easy to configure, it is prone to problems, especailly if you do it on vagrant. So let's check which tools we can use, to monitor it:

Firstly we can get a component using the get command:

Get a component

ubuntu@k8s-master:~$ kubectl get pods
NAME                                     READY   STATUS             RESTARTS   AGE
busybox                                  0/1     Pending            0          13m
first-pod                                1/1     Running            7          21h
hello-deploy-7f44bd8b96-2xz8j            1/1     Running            2          20d
hello-deploy-7f44bd8b96-4c76j            1/1     Running            2          20d
hello-deploy-7f44bd8b96-7tvcs            1/1     Running            2          20d
hello-deploy-7f44bd8b96-9lnrm            1/1     Running            2          20d
hello-deploy-7f44bd8b96-dckq2            1/1     Running            2          20d
hello-deploy-7f44bd8b96-gnvwr            1/1     Running            2          20d
hello-deploy-7f44bd8b96-p66g8            1/1     Running            2          20d
hello-deploy-7f44bd8b96-qtxgk            1/1     Running            2          20d
hello-deploy-7f44bd8b96-qz6cr            1/1     Running            2          20d
hello-deploy-7f44bd8b96-r7g4q            1/1     Running            2          20d
nfs-client-provisioner-98cdf7875-26nbg   0/1     CrashLoopBackOff   7          18m
nfs-client-provisioner-98cdf7875-kdcvv   0/1     CrashLoopBackOff   7          18m

As you can see, we have 2 failing pods, but we don't know why. This is where the “describe” command comes to place:

Describe a resource

ubuntu@k8s-master:~$ kubectl describe pod nfs-client-provisioner-98cdf7875-26nbg
Name:         nfs-client-provisioner-98cdf7875-26nbg
Namespace:    default
Priority:     0
Node:         node-2/10.0.2.15
Start Time:   Sat, 23 May 2020 13:04:31 +0000
Labels:       app=nfs-client-provisioner
              pod-template-hash=98cdf7875
Annotations:  cni.projectcalico.org/podIP: 192.168.247.1/32
Status:       Running
IP:           192.168.247.1
IPs:
  IP:           192.168.247.1
Controlled By:  ReplicaSet/nfs-client-provisioner-98cdf7875
Containers:
  nfs-client-provisioner:
    Container ID:   docker://2e4c95d43caaef0bf2aae6400fe3eb349b8452501b04c8da494052843667e1d6
    Image:          quay.io/external_storage/nfs-client-provisioner:latest
    Image ID:       docker-pullable://quay.io/external_storage/nfs-client-provisioner@sha256:022ea0b0d69834b652a4c53655d78642ae23f0324309097be874fb58d09d2919
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Sat, 23 May 2020 13:19:21 +0000
      Finished:     Sat, 23 May 2020 13:19:51 +0000
    Ready:          False
    Restart Count:  7
    Environment:
      PROVISIONER_NAME:  example.com/nfs
      NFS_SERVER:        192.168.50.10
      NFS_PATH:          /srv/nfs/kubedata
    Mounts:
      /persistentvolumes from nfs-client-root (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from nfs-client-provisioner-token-ldqw7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  nfs-client-root:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    192.168.50.10
    Path:      /srv/nfs/kubedata
    ReadOnly:  false
  nfs-client-provisioner-token-ldqw7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nfs-client-provisioner-token-ldqw7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  18m                   default-scheduler  Successfully assigned default/nfs-client-provisioner-98cdf7875-26nbg to node-2
  Normal   Created    16m (x4 over 18m)     kubelet, node-2    Created container nfs-client-provisioner
  Normal   Started    16m (x4 over 18m)     kubelet, node-2    Started container nfs-client-provisioner
  Normal   Pulling    15m (x5 over 18m)     kubelet, node-2    Pulling image "quay.io/external_storage/nfs-client-provisioner:latest"
  Normal   Pulled     14m (x5 over 18m)     kubelet, node-2    Successfully pulled image "quay.io/external_storage/nfs-client-provisioner:latest"
  Warning  BackOff    3m19s (x52 over 17m)  kubelet, node-2    Back-off restarting failed container
ubuntu@k8s-master:~$

But even that, doesn't show us so much. So what, we can do. Well we can troubleshoot :)

Firstly, let's check the logs:

Check logs

ubuntu@k8s-master:~$ kubectl logs nfs-client-provisioner-98cdf7875-26nbg
Error from server (NotFound): the server could not find the requested resource ( pods/log nfs-client-provisioner-98cdf7875-26nbg)
ubuntu@k8s-master:~$

Well, that is specific issue to Vagrant configuration. The problem here is the fact that, the vagrant is using 10.0.2.15 IP as default IP, well that isn't the IP which you want for your kubernetes' API. So, the solution for that is to, edit the: /etc/systemd/system/kubelet.service.d/10-kubeadm.conf, by adding the following:

Edit Kubelete config

Environment="KUBELET_EXTRA_ARGS=--node-ip=VAGRANT_VM_EXTERNAL_IP_HERE"

After that, we have to restart the kubelete using one of the following commands, (guess which one)

Kubelete Commands

systemctl stop kubelet
systemctl start kubelet
systemctl restart kubelet
systemctl status kubelet

But to be honest, better restart the whole Kubernetes :) After that all should be running under the correct IPs:

Check Kubernetes

ubuntu@k8s-master:~$ kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE         NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-77c5fc8d7f-djzvd   1/1     Running   0          6m15s   192.168.235.195   k8s-master   <none>           <none>
kube-system   calico-node-f6nvk                          1/1     Running   0          6m15s   192.168.50.10     k8s-master   <none>           <none>
kube-system   calico-node-km6rz                          1/1     Running   0          3m53s   192.168.50.11     node-1       <none>           <none>
kube-system   calico-node-wgq4v                          1/1     Running   0          2m49s   192.168.50.12     node-2       <none>           <none>
kube-system   coredns-66bff467f8-5mntv                   1/1     Running   0          9m34s   192.168.235.194   k8s-master   <none>           <none>
kube-system   coredns-66bff467f8-6ks2w                   1/1     Running   0          9m34s   192.168.235.193   k8s-master   <none>           <none>
kube-system   etcd-k8s-master                            1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
kube-system   kube-apiserver-k8s-master                  1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
kube-system   kube-controller-manager-k8s-master         1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
kube-system   kube-proxy-cbrt9                           1/1     Running   0          2m49s   192.168.50.12     node-2       <none>           <none>
kube-system   kube-proxy-lmn4d                           1/1     Running   0          3m53s   192.168.50.11     node-1       <none>           <none>
kube-system   kube-proxy-wfz74                           1/1     Running   0          9m34s   192.168.50.10     k8s-master   <none>           <none>
kube-system   kube-scheduler-k8s-master                  1/1     Running   0          9m44s   192.168.50.10     k8s-master   <none>           <none>
ubuntu@k8s-master:~$
ubuntu@k8s-master:~$ kubectl logs calico-node-km6rz --namespace kube-system
2020-05-25 07:49:24.728 [INFO][9] startup.go 256: Early log level set to info
2020-05-25 07:49:24.728 [INFO][9] startup.go 272: Using NODENAME environment for node name
2020-05-25 07:49:24.728 [INFO][9] startup.go 284: Determined node name: node-1
2020-05-25 07:49:24.729 [INFO][9] k8s.go 228: Using Calico IPAM
2020-05-25 07:49:24.729 [INFO][9] startup.go 316: Checking datastore connection
2020-05-25 07:49:24.737 [INFO][9] startup.go 340: Datastore connection verified
2020-05-25 07:49:24.737 [INFO][9] startup.go 95: Datastore is ready
2020-05-25 07:49:24.744 [INFO][9] startup.go 382: Initialize BGP data
2020-05-25 07:49:24.745 [INFO][9] startup.go 584: Using autodetected IPv4 address on interface enp0s8: 192.168.50.11/24

Overview