作为POC的一部分,我正在尝试备份和恢复由同一GKE集群中的GKE CSI驱动程序提供的卷。但是,还原失败,没有要调试的日志。
步骤:
创建卷快照类:kubectl create -f vsc.yaml
# vsc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-gce-vsc
labels:
"velero.io/csi-volumesnapshot-class": "true"
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
创建存储类:kubectl create -f sc.yaml
# sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-example
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-standard
创建命名空间:kubectl create namespace csi-app
创建持久卷声明:kubectl create -f pvc.yaml
# pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
namespace: csi-app
spec:
accessModes:
- ReadWriteOnce
storageClassName: pd-example
resources:
requests:
storage: 6Gi
创建一个pod来消耗pvc:kubectl create -f pod.yaml
# pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: web-server
namespace: csi-app
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- mountPath: /var/lib/www/html
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: podpvc
readOnly: false
一旦绑定了pvc,我就创建了velero备份。
velero backup create test --include-resources=pvc,pv --include-namespaces=csi-app --wait
输出:
Backup request "test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe test` and `velero backup logs test`.
velero describe backup test
Name: test
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.21.5-gke.1302
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=21
Phase: Completed
Errors: 0
Warnings: 1
Namespaces:
Included: csi-app
Excluded: <none>
Resources:
Included: pvc, pv
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2021-12-22 15:40:08 +0300 +03
Completed: 2021-12-22 15:40:10 +0300 +03
Expiration: 2022-01-21 15:40:08 +0300 +03
Total items to be backed up: 2
Items backed up: 2
Velero-Native Snapshots: <none included>
创建备份后,我验证了备份已创建,并且在我的GCS存储桶中可用。
删除所有现有资源以测试还原。
kubectl delete -f pod.yaml
kubectl delete -f pvc.yaml
kubectl delete -f sc.yaml
kubectl delete namespace csi-app
运行恢复命令:
velero restore create --from-backup test --wait
输出:
Restore request "test-20211222154302" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.
你预计会发生什么:我希望pv
、pvc
和namespace
能够恢复。
以下信息将帮助我们更好地了解发生了什么:
velero debug --backup test --restore test-20211222154302
命令被卡住超过10分钟,我无法生成支持包。输出:
2021/12/22 15:45:16 Collecting velero resources in namespace: velero
2021/12/22 15:45:24 Collecting velero deployment logs in namespace: velero
2021/12/22 15:45:28 Collecting log and information for backup: test
Environment:
Velero version (use velero version):
Client:
Version: v1.7.1
Git commit: -
Server:
Version: v1.7.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
GKE 1.21.5-gke.1302
Cloud provider or hardware configuration:
GCP
OS (e.g. from /etc/os-release):
GCP Container-Optimized OS (COS)
您应该能够检查上述日志:
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.
后者在恢复完成后可用,请检查其中的错误,它应该会显示出问题所在。
由于您使用CSI进行PV/PVC备份,因此您应该使用Velero设置来支持它:
https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html
根据你使用的插件,它可能是一个错误,比如:
https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/122
这应该在最新的0.3.2版本中得到修复,例如:
https://github.com/vmware-tanzu/velero-plugin-for-csi/releases/tag/v0.3.2
所以从开始
velero restore logs test-20211222154302
然后从那里出发。用调查结果更新问题,如果您解决了问题,请表示感谢。