Flink 写入 Hdfs 加密区域



我在当前环境中有一个加密区域hdfs。 我正在使用 Flink 1.2.0 集群使用 Apache Parquet 编写器 1.9.0 版本将 parquet 文件写入 hdfs 加密区域。 我写入非加密区域没有问题,但是当我写入加密区域的那一刻,我遇到了以下错误:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): No crypto protocol versions provided by the client are supported. Client provided: [] NameNode supports: [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2538)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2682)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2599)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:595)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:112)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:245)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy12.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1425)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:239)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:273)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:222)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:188)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:158)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:124)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:97)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:71)
at org.apache.parquet.avro.AvroParquetWriter.<init>(AvroParquetWriter.java:54)
at com.dataplatform.integration.flink.sink.BucketParquetWriter.write(BucketParquetWriter.java:160)
at com.dataplatform.integration.flink.sink.BucketParquetSink.invoke(BucketParquetSink.java:347)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:38)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:261)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:656)
at java.lang.Thread.run(Thread.java:745)

我已经阅读了Hadoop版本,并且知道Hadoop TDE(透明数据加密(版本在2.6.0更高版本中发布。 但是 Flink 使用 Hadoop 版本 2.3.0 进行编译。 就 TDE 而言,这会导致任何不兼容问题吗?

你可以下载预构建不同Hadoop版本的Flink,例如2.6.0,或者你可以通过传递-Dhadoop.version=2.6.0来自己构建Flink,选择Hadoop版本给Maven。

最新更新