我正在尝试为 GCS 中的 flink 作业配置检查点。 如果我在本地运行测试作业(没有 docker 和任何集群设置(,一切正常,但如果我使用 docker-compose 或集群设置运行它并在 flink 仪表板中使用作业部署胖罐,它会失败并显示错误。
有什么想法吗? 谢谢!
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:441)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:379)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:247)
... 33 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:64)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
环境配置是这样的:
StreamExecutionEnvironment env = applicationContext.getBean(StreamExecutionEnvironment.class);
CheckpointConfig checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setFailOnCheckpointingErrors(false);
checkpointConfig.setCheckpointInterval(10000);
checkpointConfig.setMinPauseBetweenCheckpoints(5000);
checkpointConfig.setMaxConcurrentCheckpoints(1);
checkpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
RocksDBStateBackend rocksDBStateBackend = new RocksDBStateBackend(
String.format("gs://checkpoints/%s", jobClass.getSimpleName()), true);
env.setStateBackend((StateBackend) rocksDBStateBackend);
这是我core-site.xml
文件:
<configuration>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>${user.dir}/key.json</value>
</property>
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>The FileSystem for gs: (GCS) uris.</description>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: (GCS) uris.</description>
</property>
<property>
<name>fs.gs.application.name.suffix</name>
<value>-kube-flink</value>
<description>
Appended to the user-agent header for API requests to GCS to help identify
the traffic as coming from Dataproc.
</description>
</property>
对 gcs 连接器的依赖关系:
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>1.9.4-hadoop2</version>
</dependency>
更新:
经过对依赖项的一些操作,我已经能够编写检查点。我目前的设置是:
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop2-1.9.5</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_${scala.version}</artifactId>
<version>1.5.1</version>
</dependency>
我也将 flink 图像切换到版本flink:1.5.2-hadoop28
不幸的是,我仍然无法读取检查点数据,因为我的工作总是无法恢复状态并显示错误:
java.lang.NoClassDefFoundError: com/google/cloud/hadoop/gcsio/GoogleCloudStorageImpl$6
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.open(GoogleCloudStorageImpl.java:666)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.open(GoogleCloudStorageFileSystem.java:323)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFSInputStream.<init>(GoogleHadoopFSInputStream.java:136)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.open(GoogleHadoopFileSystemBase.java:1102)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:119)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:36)
at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:80)
at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.copyStateDataHandleData(RocksDBKeyedStateBackend.java:1005)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllDataFromStateHandles(RocksDBKeyedStateBackend.java:988)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllStateDataToDirectory(RocksDBKeyedStateBackend.java:974)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:758)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:732)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:443)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
我相信这将是最后一个错误...
最后我在这里找到了解决方案
您必须创建自己的映像并将 gcs-connector 放入 lib 目录中。否则,您将始终遇到类加载问题(用户代码和系统类加载器(。
为了创建自定义 Docker 镜像,我们创建以下 Dockerfile:
FROM registry.platform.data-artisans.net/trial/v1.0/flink:1.4.2-dap1-scala_2.11 RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar && wget http://ftp.fau.de/apache/flink/flink-1.4.2/flink-1.4.2-bin-hadoop28-scala_2.11.tgz && tar xf flink-1.4.2-bin-hadoop28-scala_2.11.tgz && mv flink-1.4.2/lib/flink-shaded-hadoop2* lib/ && rm -r flink-1.4.2* RUN mkdir etc-hadoop COPY <name of key file>.json etc-hadoop/ COPY core-site.xml etc-hadoop/ ENTRYPOINT ["/docker-entrypoint.sh"] EXPOSE 6123 8081 CMD ["jobmanager"]
Docker 镜像将基于我们提供的 Flink 镜像 dA 平台试用的一部分。我们正在添加谷歌云存储 连接器,Flink的Hadoop软件包和带有配置的密钥 文件。
若要生成自定义映像,以下文件应位于 当前目录:core-site.xml、Dockerfile 和 key-file (.json(。
为了最终触发自定义映像的构建,我们运行以下命令 命令:
$ docker build -t flink-1.4.2-gs .
构建图像后,我们会将图像上传到谷歌的 容器注册表。 要将 Docker 配置为正确访问 注册表中,运行以下命令一次:
$ gcloud auth configure-docker
接下来,我们将标记并上传容器:
$ docker tag flink-1.4.2-gs:latest eu.gcr.io/<your project id>/flink-1.4.2-gs $ docker push eu.gcr.io/<your project id>/flink-1.4.2-gs
上传完成后,我们需要为 应用程序管理器部署。 已发送以下补丁请求:
PATCH /api/v1/deployments/<your AppMgr deployment id> spec: template: spec: flinkConfiguration: fs.hdfs.hadoopconf: /opt/flink/etc-hadoop/ artifact: flinkImageRegistry: eu.gcr.io flinkImageRepository: <your project id>/flink-1.4.2-gs flinkImageTag: latest
或者,使用以下 curl 命令:
$ curl -X PATCH --header 'Content-Type: application/yaml' --header 'Accept: application/yaml' -d ' spec: template: spec: flinkConfiguration: fs.hdfs.hadoopconf: /opt/flink/etc-hadoop/ artifact: flinkImageRegistry: eu.gcr.io flinkImageRepository: <your project id>/flink-1.4.2-gs flinkImageTag: latest' 'http://localhost:8080/api/v1/deployments/<your AppMgr deployment id>‘
实施此更改后,您将能够检查点到 Google 的 云存储。指定目录时使用以下模式 gs:///checkpoints。对于保存点,设置 state.savepoints.dir Flink 配置选项。
问题是方案 gs://的实现。这是连接到GCS的协议。如果添加以下依赖项,Java 程序应该能够运行:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
在此链接中,您将了解如何为任何其他编程语言添加此依赖项。
适用于这个 docker 文件,重点是采用正确的依赖项版本。
我基于flink k8s运算符的这个解决方案
- 码头工人文件
ARG FLINK_VERSION=1.13.1
ARG SCALA_VERSION=2.12
FROM flink:${FLINK_VERSION}-scala_${SCALA_VERSION}-java8
ARG FLINK_HADOOP_VERSION=2.8.3-10.0
ARG GCS_CONNECTOR_VERSION=latest-hadoop2
ARG GCS_CONNECTOR_NAME=gcs-connector-${GCS_CONNECTOR_VERSION}.jar
ARG GCS_CONNECTOR_URI=https://storage.googleapis.com/hadoop-lib/gcs/${GCS_CONNECTOR_NAME}
ARG FLINK_HADOOP_JAR_NAME=flink-shaded-hadoop-2-uber-${FLINK_HADOOP_VERSION}.jar
ARG FLINK_HADOOP_JAR_URI=https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/${FLINK_HADOOP_VERSION}/${FLINK_HADOOP_JAR_NAME}
#COPY target/lib /opt/flink/lib
RUN echo "Downloading ${GCS_CONNECTOR_URI}" &&
wget -q -O /opt/flink/lib/${GCS_CONNECTOR_NAME} ${GCS_CONNECTOR_URI}
RUN echo "Downloading ${FLINK_HADOOP_JAR_URI}" &&
wget -q -O /opt/flink/lib/${FLINK_HADOOP_JAR_NAME} ${FLINK_HADOOP_JAR_URI}
COPY target/play-flink-1.0-SNAPSHOT.jar /opt/flink/usrlib/play-flink-1.0-SNAPSHOT.jar
- 创建核心.xml文件:
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: uris.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>projectName</value>
<description>
Optional. Google Cloud Project ID with access to GCS buckets.
Required only for list buckets and create bucket operations.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorization.
</description>
</property>
</configuration>
create core.xml as configmap
kubectl create configmap hadoop-configmap --from-file core-site.xml
为服务帐户创建密钥
kubectl create secret generic gcp-secret --from-file=key.json=${SERVICE_ACCOUNT_FILE}
在 job.yaml 中加载配置映射和服务帐户机密
volumeMounts:
- name: hadoop-configmap-volume
mountPath: /etc/hadoop/conf
- name: google-cloud-key
mountPath: /etc/gcp/keys
....
volumes:
- name: google-cloud-key
secret:
secretName: gcp-secret
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: hadoop-configmap-volume
configMap:
name: hadoop-configmap
items:
- key: core-site.xml
path: core-site.xml
- 在 flink 中配置卷挂载路径
fs.hdfs.hadoopconf: /etc/hadoop/conf