====== 15 Kubernetes Error ======

===== ContainerCreatingままスタック =====

kubectl describe pods で見ると下記のエラーが出ている
    Warning  FailedCreatePodSandBox  90s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "49a5d016c6aacfbea51d08b00f0edef8575396ba4843294ee176269bdc2d4132": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.6.1/24


==== 対応 ====

Nodeを1度再起動してあげれば治る


===== Unable to connect to the server: x509 =====

  Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

==== 対応 ====
下記実行
<code>
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
unset KUBECONFIG
export KUBECONFIG=/etc/kubernetes/admin.conf
</code>
{{tag>Kubernetes}}


===== rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService =====

<code>
root@g-master:~# kubeadm init --pod-network-cidr=10.224.0.0/16
[init] Using Kubernetes version: v1.23.5
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR CRI]: container runtime is not running: output: time="2022-04-19T02:14:43Z" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
</code>


==== 対応 ====

  rm /etc/containerd/config.toml
  systemctl restart containerd
  kubeadm reset


===== Warning  FailedScheduling that the pod didn't tolerate =====

<code>
  Warning  FailedScheduling  63s   default-scheduler  0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
</code>

==== 対応1 ====

taintが付いている可能性

これはマスターはスケージュールしないというtaint
taintの後ろに「-」ハイフンを付けると、untaintする。
<code>
# kubectl describe node g-master | grep -i taint
Taints:             node-role.kubernetes.io/master:NoSchedule

#kubectl taint node g-master node-role.kubernetes.io/master:NoSchedule-

# kubectl describe node g-master | grep -i taint
Taints:             node.kubernetes.io/not-ready:NoSchedule
</code>

==== 対応2 ====

STATUSがNotReady、これだとスケージュールされない

<code>
# kubectl get node
NAME       STATUS     ROLES                  AGE    VERSION
g-master   NotReady   control-plane,master   147m   v1.23.5
g-work01   NotReady   <none>                 146m   v1.23.5
g-work02   NotReady   <none>                 146m   v1.23.5
g-work03   NotReady   <none>                 52m    v1.23.5
</code>

fannelがapplyされてない可能性

  # kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml


===== Initial timeout of 40s passed. =====

kubeletが起動していない。。

<code>
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
		- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
</code>

==== 対応1 ====

  systemctl start kubelet

==== 対応2 ====

こんなログが出ている場合。

これは「--control-plane-endpoint」を指定している場合に出るエラーで、VIPが起動してないと思われる。

VIPがないと、対象のcontrol-plane-endpointでマスターホストに接続できないからnot foundで出るよう。

一度下記などで、VIPを付けて試してみる。
  ip addr add [VIP] dev [Eth]

※VIPだけ先に用意していおいて、後からipvsadmなどでVIP用意すれば良いと思う。

<code>
# systemctl status kubelet
node \"g-master02\" not found"
Apr 19 22:58:16 g-master02 kubelet[20595]: E0419 22:58:16.331432   20595 kubelet.go:2422] "Error getting node" err="node \"g-master02\" not found"
</code>

===== failure loading certificate for CA: couldn't load the certificate file =====

<code>
W0421 22:57:17.689558    6442 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

[failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory, failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory, failure loading certificate for front-proxy CA: couldn't load the certificate file /etc/kubernetes/pki/front-proxy-ca.crt: open /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory, failure loading certificate for etcd CA: couldn't load the certificate file /etc/kubernetes/pki/etcd/ca.crt: open /etc/kubernetes/pki/etcd/ca.crt: no such file or directory]

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher
</code>

==== 対応 ====

kubernetes controllerをjoinする時に出るエラー


[[06_virtualization:05_container:14_kubernetes_master_cluster
#Master Join token再作成|Master Join token再作成]]をしれあげれば、joinできる。