我已经测试了指南Kubernetes the hard way和对AWS Kubernetes The Hard Way - AWS的适应。
DNS插件甚至仪表板一切正常,如此处所述。
但是,如果我创建负载均衡器服务,它不起作用,因为未部署云控制器管理器(作为主组件或守护程序集(。
我 https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/阅读本文以获取有关如何部署它的一些信息,但是如果我应用所需的更改(在 kubelet 上:--cloud-provider=external(并部署守护进程集:
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: cloud-controller-manager
name: cloud-controller-manager
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: cloud-controller-manager
template:
metadata:
labels:
k8s-app: cloud-controller-manager
spec:
serviceAccountName: cloud-controller-manager
containers:
- name: cloud-controller-manager
image: k8s.gcr.io/cloud-controller-manager:v1.8.0
command:
- /usr/local/bin/cloud-controller-manager
- --cloud-provider=aws
- --leader-elect=true
- --use-service-account-credentials
- --allocate-node-cidrs=true
- --configure-cloud-routes=true
- --cluster-cidr=${CLUSTERCIRD}
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/master: ""
实例(控制器和工作线程(具有所有正确的角色。
我什至无法创建 pod,状态保持"待处理"......
您是否知道如何在 AWS 集群上将云控制器管理器部署为守护程序集或主组件(不使用 kops、kubeadm,...(?
你知道一个可以帮助我的指南吗?
您能否举一个云控制器管理器守护程序集配置的示例?
提前致谢
更新
执行时,kubectl get nodes
我得到一个No resources found
.
在描述启动的 pod 时,我只得到一个事件:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x2 over 28s) default-scheduler no nodes available to schedule pods
现在的问题应该是:如何让节点准备好为 AWS 部署的云控制器管理器?
正如 samhain1138 所提到的,您的集群看起来不健康,无法安装任何东西。在简单的情况下,它可以被修复,但有时最好重新安装所有内容。
让我们尝试调查这个问题。
首先,检查您的主节点状态。通常,这意味着您应该运行kubelet
服务。
检查 kubelet 日志中的错误:
$ journalctl -u kubelet
接下来,检查静态 Pod 的状态。您可以在/etc/kubernetes/manifets
目录中找到它们的列表:
$ ls /etc/kubernetes/manifests
etcd.yaml
kube-apiserver.yaml
kube-controller-manager.yaml
kube-scheduler.yaml
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5cbdc1c13c25 8a7739f672b4 "/sidecar --v=2 --..." 2 weeks ago Up 2 weeks k8s_sidecar_kube-dns-86c47599bd-l7d6m_kube-system_...
bd96ffafdfa6 6816817d9dce "/dnsmasq-nanny -v..." 2 weeks ago Up 2 weeks k8s_dnsmasq_kube-dns-86c47599bd-l7d6m_kube-system_...
69931b5b4cf9 55ffe31ac578 "/kube-dns --domai..." 2 weeks ago Up 2 weeks k8s_kubedns_kube-dns-86c47599bd-l7d6m_kube-system_...
60885aeffc05 k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-dns-86c47599bd-l7d6m_kube-system_...
93144593660c 9f355e076ea7 "/install-cni.sh" 2 weeks ago Up 2 weeks k8s_install-cni_calico-node-nxljq_kube-system_...
b55f57529671 7eca10056c8e "start_runit" 2 weeks ago Up 2 weeks k8s_calico-node_calico-node-nxljq_kube-system_...
d8767b9c07c8 46a3cd725628 "/usr/local/bin/ku..." 2 weeks ago Up 2 weeks k8s_kube-proxy_kube-proxy-lf8gd_kube-system_...
f924cefb953f k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_calico-node-nxljq_kube-system_...
09ceddabdeb9 k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-proxy-lf8gd_kube-system_...
9fc90839bb6f 821507941e9c "kube-apiserver --..." 2 weeks ago Up 2 weeks k8s_kube-apiserver_kube-apiserver-kube-master_kube-system_...
8ea410ce00a6 b8df3b177be2 "etcd --advertise-..." 2 weeks ago Up 2 weeks k8s_etcd_etcd-kube-master_kube-system_...
dd7f9b381e4f 38521457c799 "kube-controller-m..." 2 weeks ago Up 2 weeks k8s_kube-controller-manager_kube-controller-manager-kube-master_kube-system_...
f6681365bea8 37a1403e6c1a "kube-scheduler --..." 2 weeks ago Up 2 weeks k8s_kube-scheduler_kube-scheduler-kube-master_kube-system_...
0638e47ec57e k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_etcd-kube-master_kube-system_...
5bbe35abb3a3 k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-controller-manager-kube-master_kube-system_...
2dc6ee716bb4 k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-scheduler-kube-master_kube-system_...
b15dfc9f089a k8s.gcr.io/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_kube-apiserver-kube-master_kube-system_...
您可以使用以下命令查看任何 Pod 容器的详细说明:
$ docker inspect <container_id>
或检查日志:
$ docker logs <container_id>
这应该足以了解下一步该怎么做,要么尝试修复集群,要么拆除所有内容并从头开始。
为了简化预配 Kubernetes 集群的过程,您可以使用kubeadm
,如下所示:
# This instruction is for ubuntu VMs, if you use CentOS, the commands will be
# slightly different.
### These steps are the same for the master and the worker nodes
# become root
$ sudo su
# add repository and keys
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# install components
$ apt-get update
$ apt-get -y install ebtables ethtool docker.io apt-transport-https kubelet kubeadm kubectl
# adjust sysctl settings
$ cat <<EOF >>/etc/ufw/sysctl.conf
net/ipv4/ip_forward = 1
net/bridge/bridge-nf-call-ip6tables = 1
net/bridge/bridge-nf-call-iptables = 1
net/bridge/bridge-nf-call-arptables = 1
EOF
$ sysctl --system
### Next steps are for the master node only.
# Create Kubernetes cluster
$ kubeadm init --pod-network-cidr=192.168.0.0/16
or if you want to use older KubeDNS instead of CoreDNS:
$ kubeadm init --pod-network-cidr=192.168.0.0/16 --feature-gates=CoreDNS=false
# Configure kubectl
$ mkdir -p $HOME/.kube
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ chown $(id -u):$(id -g) $HOME/.kube/config
# install Calico network
$ kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
# or install Flannel (not both)
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# Untaint master or/and join other nodes:
$ kubectl taint nodes --all node-role.kubernetes.io/master-
# run on master if you forgot the join command:
$ kubeadm token create --print-join-command
# run command printed on the previous step on the worker node to join it to the existing cluster.
# At this point you should have ready to user Kubernetes cluster.
$ kubectl get nodes -o wide
$ kubectl get pods,svc,deployments,daemonsets --all-namespaces
恢复群集后,是否可以尝试再次安装cloud-controller-manager
并共享结果?
我在尝试使用 GCE 设置cloud-provider
时遇到了同样的问题。我通过将以下标志添加到kube-apiserver.service
、kubelet.service
和kube-controller-manager.service
中解决了这个问题。
--cloud-provider=gce
--cloud-config=/var/lib/gce.conf
gce.conf
文件基于从Google IAM服务帐户生成的json密钥文件,但采用Gcfg格式。我相信AWS也有类似的东西。格式如下所示:
[Global]
type = xxx
project-id = xxx
private-key-id = xxx
private-key = xxx
client-email = xxx
client-id = xxx
auth-uri = xxx
token-uri = xxx
auth-provider-x509-cert-url = xxx
client-x509-cert-url = xxx
有关更多信息,请参阅有关云提供商的 K8s 文档。
忘记云控制器管理器,你似乎没有一个正常运行的 Kubernetes 集群来运行它!!
Kubernetes 确切地告诉你这一点,但你忽略了它......
没有冒犯,但也许如果你没有使用 Kubernetes 的经验,你不应该尝试遵循一个名为 Kubernetes The Hard Way 的指南(你失败了,你没有提供任何信息让我指出确切的原因/方式(,而是使用 kops 或 kubeadm?