Overview
This morning i received report from developer, he cant update image container image from our application even the CI/CD report success. after checking the cluster i checked the deployment file is already using desired image version, however when i checked the pods it still used the old one.
Troubleshooting Process
In kubernetes the component for managing container or schedule update container to nodes is scheduler. i check the logs there was an expired certificate connecting to api server.
ak8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSINode: failed to list *v1.CSINode: Get certificate has expired or not yet valid kube-schedulerI checked the cert expiration for kubernetes with this command
kubeadm certs check-expirationfrom the output i got most of the component kubernetes is expired, then i tried to renew.
sudo kubeadm certs renew allafter that its important to restart all component control plane to ensure is use new certificate
mv /etc/kubernetes/manifests/*.yaml /tmp/
sleep 20
sudo mv /tmp/*.yaml /etc/kubernetes/manifests/when the controller start again, i check all the component controler plane is running now
kubectl get pods -n kube-system
the next thing is checked is the node server, unfortunetly there was an error status not ready for some nodes, so i cordon the node and push to rejoin again.
kubectl get nodes
kubectl cordon kube-node04 kube-node05 kube-node08generate kube token again to rejoin the cordon node
kubeadm token create --print-join-commandlogin ssh to each server and reset the node
kubeadm reset
# join the node to the cluster again
kubeadm join 1.2.3.4:6443 --token ry7kio.i7k2 --discovery-token-ca-cert-hash sha256:250fdthe node is starting again now uncordon the node again to we can schedule to that node.
kubectl uncordon kube-node04 kube-node05 kube-node08Test create dummy deployment to make sure the scheduler and node is working.
kubectl create deployment mynginx --image=nginx --replicas=3 --namespace=develop