How to debug the CrashLoopBackOff in Kubernetes when pod is not starting

Here is my learning of how I debugged the CrashLoopBackOff in kubernetes when the pod wasn’t starting.

I wanted to deploy the jenkins docker image in the cluster. As mentioned in the jenkins docker  repo I wanted to mount an external drive which is an AWS EBS volume.  Here is my deployment yaml.

After starting the deployment the pod never came up and this the output of  kubectl get pod

NAME                                      READY      STATUS                  RESTARTS  AGE
jenkins-3317895845-x84u3  0/1       CrashLoopBackOff      10                 27m

The next step was to issue the kubectl describe pod jenkins-3317895845-x84u3

So from the logs I could make out the container is being pulled correctly but it is failing on StartContainer.  Now based on this information the next step was to get the docker logs. But the docker logs aren’t accessible from my box because it is managed by kubernetes. The only way to get the docker logs was to actually ssh into the box.

But which box should I ssh? The describe output has the node information which is Node: ip-172-20-0-29.us-west-2.compute.internal/172.20.0.29 . This is the private ip of the box in aws but with that information you should be able to figure out the public ip to ssh.

Now I know I have to ssh but where is the key for this server. If you have used the kube-up.sh then the keys would be stored  in this location ~/.ssh/kube_aws_rsa. ssh -i ~/.ssh/kube_aws_rsa admin@public-ip-oftheabovenode.

After sshing into the box I issued the command sudo docker ps -a | grep naveen because the container could have been stopped and looked for naveen because that was my container name. This gave me container id which was stopped with exit status as 1.

And this was the output of docker logs command

admin@ip-172-20-0-29:~$ sudo docker logs c98a338268a1
touch: cannot touch ‘/var/jenkins_home/copy_reference_file.log’: Permission denied
Can not write to /var/jenkins_home/copy_reference_file.log. Wrong volume permissions?

which identified the /var/jenkins_home  which was mounted with aws ebs voulme  didn’t have permission to write by the jenkins user https://github.com/kubernetes/kubernetes/issues/2630.

And after doing all of this I realized I could have done kubectl logs jenkins-3317895845-x84u3 which would have given the same output without having to ssh into the box. But knowing this handy because when things go wrong we really need to debug the root cause.