ユーザ用ツール

サイト用ツール


06_virtualization:05_container:15_kubernetes_error

差分

このページの2つのバージョン間の差分を表示します。

この比較画面へのリンク

両方とも前のリビジョン前のリビジョン
次のリビジョン
前のリビジョン
06_virtualization:05_container:15_kubernetes_error [2022/04/19 02:18] matsui06_virtualization:05_container:15_kubernetes_error [2022/05/24 06:29] (現在) – [対応] matsui
行 46: 行 46:
   rm /etc/containerd/config.toml   rm /etc/containerd/config.toml
   systemctl restart containerd   systemctl restart containerd
 +  kubeadm reset
 +
 +
 +===== Warning  FailedScheduling that the pod didn't tolerate =====
 +
 +<code>
 +  Warning  FailedScheduling  63s   default-scheduler  0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
 +</code>
 +
 +==== 対応1 ====
 +
 +taintが付いている可能性
 +
 +これはマスターはスケージュールしないというtaint
 +taintの後ろに「-」ハイフンを付けると、untaintする。
 +<code>
 +# kubectl describe node g-master | grep -i taint
 +Taints:             node-role.kubernetes.io/master:NoSchedule
 +
 +#kubectl taint node g-master node-role.kubernetes.io/master:NoSchedule-
 +
 +# kubectl describe node g-master | grep -i taint
 +Taints:             node.kubernetes.io/not-ready:NoSchedule
 +</code>
 +
 +==== 対応2 ====
 +
 +STATUSがNotReady、これだとスケージュールされない
 +
 +<code>
 +# kubectl get node
 +NAME       STATUS     ROLES                  AGE    VERSION
 +g-master   NotReady   control-plane,master   147m   v1.23.5
 +g-work01   NotReady   <none>                 146m   v1.23.5
 +g-work02   NotReady   <none>                 146m   v1.23.5
 +g-work03   NotReady   <none>                 52m    v1.23.5
 +</code>
 +
 +fannelがapplyされてない可能性
 +
 +  # kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
 +
 +
 +===== Initial timeout of 40s passed. =====
 +
 +kubeletが起動していない。。
 +
 +<code>
 +[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
 +[kubelet-check] Initial timeout of 40s passed.
 +
 + Unfortunately, an error has occurred:
 + timed out waiting for the condition
 +
 + This error is likely caused by:
 + - The kubelet is not running
 + - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
 +
 + If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
 + - 'systemctl status kubelet'
 + - 'journalctl -xeu kubelet'
 +
 + Additionally, a control plane component may have crashed or exited when started by the container runtime.
 + To troubleshoot, list all containers using your preferred container runtimes CLI.
 +
 + Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
 + - 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
 + Once you have found the failing container, you can inspect its logs with:
 + - 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'
 +
 +error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
 +To see the stack trace of this error execute with --v=5 or higher
 +</code>
 +
 +==== 対応1 ====
 +
 +  systemctl start kubelet
 +
 +==== 対応2 ====
 +
 +こんなログが出ている場合。
 +
 +これは「--control-plane-endpoint」を指定している場合に出るエラーで、VIPが起動してないと思われる。
 +
 +VIPがないと、対象のcontrol-plane-endpointでマスターホストに接続できないからnot foundで出るよう。
 +
 +一度下記などで、VIPを付けて試してみる。
 +  ip addr add [VIP] dev [Eth]
 +
 +※VIPだけ先に用意していおいて、後からipvsadmなどでVIP用意すれば良いと思う。
 +
 +<code>
 +# systemctl status kubelet
 +node \"g-master02\" not found"
 +Apr 19 22:58:16 g-master02 kubelet[20595]: E0419 22:58:16.331432   20595 kubelet.go:2422] "Error getting node" err="node \"g-master02\" not found"
 +</code>
 +
 +===== failure loading certificate for CA: couldn't load the certificate file =====
 +
 +<code>
 +W0421 22:57:17.689558    6442 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
 +error execution phase preflight: 
 +One or more conditions for hosting a new control plane instance is not satisfied.
 +
 +[failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory, failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory, failure loading certificate for front-proxy CA: couldn't load the certificate file /etc/kubernetes/pki/front-proxy-ca.crt: open /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory, failure loading certificate for etcd CA: couldn't load the certificate file /etc/kubernetes/pki/etcd/ca.crt: open /etc/kubernetes/pki/etcd/ca.crt: no such file or directory]
 +
 +Please ensure that:
 +* The cluster has a stable controlPlaneEndpoint address.
 +* The certificates that must be shared among control plane instances are provided.
 +
 +
 +To see the stack trace of this error execute with --v=5 or higher
 +</code>
 +
 +==== 対応 ====
 +
 +kubernetes controllerをjoinする時に出るエラー
 +
 +
 +[[06_virtualization:05_container:14_kubernetes_master_cluster
 +#Master Join token再作成|Master Join token再作成]]をしれあげれば、joinできる。
  
  
06_virtualization/05_container/15_kubernetes_error.1650334713.txt.gz · 最終更新: 2022/04/19 02:18 by matsui