k8s-Cluster Maintenance

OS Upgrades

For example,you have a cluster with a few nodes and pods serving applications.If one of these nodes go down,the pods on that node are not accessible.

If the node came back online immediately,then the kubectl process starts and the pods come back.However,if the node was down for more than 5 mins,then the pods are terminated from that node.

pod eviction:

The time wait for a pod to come back.

Pod eviction is set on the controller manager

kube-controller-manager --pod-eviction-timeout=5m0s

When the node comes back online after the pod eviction timeout it comes up blank without any pods scheduled on it.

Maintenance a node

kubectl drain node-1:

When you drain(放空，放干) the node the pods are gracefully terminated from the node that they’re on and recreated on another.

The node is also cordoned or marked as unschedulable. Meaning no pods can be scheduled on this node until you specifically remove the restriction.
kubectl uncordon node-1:

Uncordon the node,so that pods can be scheduled on it again.
kubectl cordon node-2:

cordon simply marks a node unschedulable. Unlike drain it does not terminate or move the pods on an existing node.It simply make sure than new pods are not scheduled on that node.

Practice

On which nodes are the applications hosted on?

Run the command kubectl get pods -o wide and get the list of nodes the pods are placed on
We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.

kubectl drain node01 --ignore-daemonsets
The maintenance tasks have been completed. Configure the node to be schedulable again.

kubectl uncordon node01
Why are there no pods placed on the master node?

Use the command kubectl describe node master/controlplane
Node03 has our critical applications. We do not want to schedule any more apps on node03. Mark node03 as unschedulable but do not remove any apps currently running on it .

kubectl cordon node03

kubernetes Release

alpha release: The features are disabled by default and maybe buggy.
beta release: The code is well tested.

The ETCD and CoreDNS servers have their own versions as they are separate projects.

Cluster Upgrade Process

Component Version

The kube-apiserver is the primary component.

None of the other components should ever be at a version higher than the kube-apiserver the controller.

So if kube-apiserver was at X ,Controller-manager and kube-scheduler can be at X-1 and the kubelet and kube-proxy components can be at X-2.

Upgrade Step

We cannot upgrade directly from 1.10 to 1.13. We should upgrade one minor version at a time.(v1.10 to 1.11 to 1.12 to 1.13.)

kubeadm upgrade

upgrade master node

kubeadm upgrade plan:Check the latest stable version available for upgrade

The master node version is still v1.11.3.

This is because in the output of this command it is showing the versions of kubelet on each of these nodes registered with the API server and not the version of the kube-apiserver itself.

Then upgrade kubelet

upgrade worker node

upgrade node01

kubectl drain node01

ssh node01

apt-get upgrade -y kubeadm=1.12.0-00

apt-get upgrade -y kubelet=1.12.0-00

kubeadm upgrade node

systemctl restart kubelet

kubectl uncordon node-1

node02 : same as the upgarde of node01
node03

Practice

What is the latest stable version available for upgrade?Use kubeadm tool

kubeadm upgrade plan
We will be upgrading the master node first. Drain the master node of workloads and mark it UnSchedulable

kubectl drain master/controlplane --ignore-daemonsets
Upgrade the master/controlplane components to exact version v1.19.0

Upgrade kubeadm tool (if not already), then the master components, and finally the kubelet. Practice referring to the kubernetes documentation page. Note: While upgrading kubelet, if you hit dependency issue while running the apt-get upgrade kubelet command, use the apt install kubelet=1.19.0-00 command instead

apt update

apt install -y kubeadm=1.19.0-00

kubeadm upgrade apply v1.19.0

apt install -y kubelet=1.19.0-00

systemctl restart kubelet

Backup and Restore Methods

Query kube-apiserver

A better approach to backing up resource configuration is to use query the kube-apiserver

use kubectl to backup resources

kubectl get all --all-namespaces -o yaml > all-deploy-services.yaml

Backup - ETCD

Take snapshot of etcd database (named snapshot.db)

ETCDCTL_API=3 etcdctl snapshot save snapshot.db
View the status of the backup

ETCDCTL_API=3 etcdctl snapshot status snapshot.db

Restore-ETCD

When ETCD restores from a backup, it initializes a new cluster configuration and configures the members of ETCD as new members to a new cluster.

Use etcdctl restore command
Set --data-dir
--initial-cluster-token:This is to prevent a new member from accidentally joining an existing cluster.