Velero-使用CSI驱动程序配置的卷部分还原失败



作为POC的一部分,我正在尝试备份和恢复由同一GKE集群中的GKE CSI驱动程序提供的卷。但是,还原失败,没有要调试的日志。

步骤:

创建卷快照类:kubectl create -f vsc.yaml

# vsc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-gce-vsc
labels:
"velero.io/csi-volumesnapshot-class": "true"
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

创建存储类:kubectl create -f sc.yaml

# sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-example
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-standard

创建命名空间:kubectl create namespace csi-app

创建持久卷声明:kubectl create -f pvc.yaml

# pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
namespace: csi-app
spec:
accessModes:
- ReadWriteOnce
storageClassName: pd-example
resources:
requests:
storage: 6Gi

创建一个pod来消耗pvc:kubectl create -f pod.yaml

# pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: web-server
namespace: csi-app
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- mountPath: /var/lib/www/html
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: podpvc
readOnly: false

一旦绑定了pvc,我就创建了velero备份。

velero backup create test --include-resources=pvc,pv --include-namespaces=csi-app --wait

输出:

Backup request "test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe test` and `velero backup logs test`.
velero describe backup test
Name:         test
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.5-gke.1302
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=21
Phase:  Completed
Errors:    0
Warnings:  1
Namespaces:
Included:  csi-app
Excluded:  <none>
Resources:
Included:        pvc, pv
Excluded:        <none>
Cluster-scoped:  auto
Label selector:  <none>
Storage Location:  default
Velero-Native Snapshot PVs:  auto
TTL:  720h0m0s
Hooks:  <none>
Backup Format Version:  1.1.0
Started:    2021-12-22 15:40:08 +0300 +03
Completed:  2021-12-22 15:40:10 +0300 +03
Expiration:  2022-01-21 15:40:08 +0300 +03
Total items to be backed up:  2
Items backed up:              2
Velero-Native Snapshots: <none included>

创建备份后,我验证了备份已创建,并且在我的GCS存储桶中可用。

删除所有现有资源以测试还原。

kubectl delete -f pod.yaml
kubectl delete -f pvc.yaml
kubectl delete -f sc.yaml
kubectl delete namespace csi-app

运行恢复命令:

velero restore create --from-backup test --wait

输出:

Restore request "test-20211222154302" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.

你预计会发生什么:我希望pvpvcnamespace能够恢复。

以下信息将帮助我们更好地了解发生了什么:

velero debug --backup test --restore test-20211222154302命令被卡住超过10分钟,我无法生成支持包。输出:

2021/12/22 15:45:16 Collecting velero resources in namespace: velero
2021/12/22 15:45:24 Collecting velero deployment logs in namespace: velero
2021/12/22 15:45:28 Collecting log and information for backup: test
Environment:
Velero version (use velero version):
Client:
Version: v1.7.1
Git commit: -
Server:
Version: v1.7.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
GKE 1.21.5-gke.1302
Cloud provider or hardware configuration:
GCP
OS (e.g. from /etc/os-release):
GCP Container-Optimized OS (COS)

您应该能够检查上述日志:

Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.

后者在恢复完成后可用,请检查其中的错误,它应该会显示出问题所在。

由于您使用CSI进行PV/PVC备份,因此您应该使用Velero设置来支持它:

https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html

根据你使用的插件,它可能是一个错误,比如:

https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/122

这应该在最新的0.3.2版本中得到修复,例如:

https://github.com/vmware-tanzu/velero-plugin-for-csi/releases/tag/v0.3.2

所以从开始

velero restore logs test-20211222154302

然后从那里出发。用调查结果更新问题,如果您解决了问题,请表示感谢。

最新更新