我可以毫无问题地从 IntelliJ 运行我的 Flink 作业,但是当我尝试在 flink 独立上运行它时......
wget ... flink-1.4.0-bin-hadoop27-scala_2.11.tgz
tar xf flink-1.4.0-bin-hadoop27-scala_2.11.tgz
./flink-1.4.0/bin/start-local.sh
./flink-1.4.0/bin/flink run ../mypath/target/messagehub-to-s3-1.0-SNAPSHOT.jar ...
我收到错误消息:
Caused by: java.lang.ClassCastException: org.apache.hadoop.fs.s3a.S3AFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:112)
我正在使用 maven shade 插件(基于 dataArtisans 示例)来创建我的 jar 文件:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<!-- This list contains all dependencies of flink-dist
Everything else will be packaged into the fat-jar
-->
<exclude>org.apache.flink:flink-annotations</exclude>
<exclude>org.apache.flink:flink-shaded-curator-recipes</exclude>
<exclude>org.apache.flink:flink-core</exclude>
<exclude>org.apache.flink:flink-java</exclude>
<exclude>org.apache.flink:flink-scala_2.11</exclude>
<exclude>org.apache.flink:flink-runtime_2.11</exclude>
<exclude>org.apache.flink:flink-optimizer_2.11</exclude>
<exclude>org.apache.flink:flink-clients_2.11</exclude>
<exclude>org.apache.flink:flink-avro_2.11</exclude>
<exclude>org.apache.flink:flink-examples-batch_2.11</exclude>
<exclude>org.apache.flink:flink-examples-streaming_2.11</exclude>
<exclude>org.apache.flink:flink-streaming-java_2.11</exclude>
<exclude>org.apache.flink:flink-streaming-scala_2.11</exclude>
<exclude>org.apache.flink:flink-scala-shell_2.11</exclude>
<exclude>org.apache.flink:flink-python</exclude>
<exclude>org.apache.flink:flink-metrics-core</exclude>
<exclude>org.apache.flink:flink-metrics-jmx</exclude>
<exclude>org.apache.flink:flink-statebackend-rocksdb_2.11</exclude>
<!-- Also exclude very big transitive dependencies of Flink
WARNING: You have to remove these excludes if your code relies on other
versions of these dependencies.
-->
<exclude>log4j:log4j</exclude>
<exclude>org.scala-lang:scala-library</exclude>
<exclude>org.scala-lang:scala-compiler</exclude>
<exclude>org.scala-lang:scala-reflect</exclude>
<exclude>com.data-artisans:flakka-actor_*</exclude>
<exclude>com.data-artisans:flakka-remote_*</exclude>
<exclude>com.data-artisans:flakka-slf4j_*</exclude>
<exclude>io.netty:netty-all</exclude>
<exclude>io.netty:netty</exclude>
<exclude>commons-fileupload:commons-fileupload</exclude>
<exclude>org.apache.avro:avro</exclude>
<exclude>commons-collections:commons-collections</exclude>
<exclude>org.codehaus.jackson:jackson-core-asl</exclude>
<exclude>org.codehaus.jackson:jackson-mapper-asl</exclude>
<exclude>com.thoughtworks.paranamer:paranamer</exclude>
<exclude>org.xerial.snappy:snappy-java</exclude>
<exclude>org.apache.commons:commons-compress</exclude>
<exclude>org.tukaani:xz</exclude>
<exclude>com.esotericsoftware.kryo:kryo</exclude>
<exclude>com.esotericsoftware.minlog:minlog</exclude>
<exclude>org.objenesis:objenesis</exclude>
<exclude>com.twitter:chill_*</exclude>
<exclude>com.twitter:chill-java</exclude>
<exclude>commons-lang:commons-lang</exclude>
<exclude>junit:junit</exclude>
<exclude>org.apache.commons:commons-lang3</exclude>
<exclude>org.slf4j:slf4j-api</exclude>
<exclude>org.slf4j:slf4j-log4j12</exclude>
<exclude>log4j:log4j</exclude>
<exclude>org.apache.commons:commons-math</exclude>
<exclude>org.apache.sling:org.apache.sling.commons.json</exclude>
<exclude>commons-logging:commons-logging</exclude>
<exclude>commons-codec:commons-codec</exclude>
<exclude>com.fasterxml.jackson.core:jackson-core</exclude>
<exclude>com.fasterxml.jackson.core:jackson-databind</exclude>
<exclude>com.fasterxml.jackson.core:jackson-annotations</exclude>
<exclude>stax:stax-api</exclude>
<exclude>com.typesafe:config</exclude>
<exclude>org.uncommons.maths:uncommons-maths</exclude>
<exclude>com.github.scopt:scopt_*</exclude>
<exclude>commons-io:commons-io</exclude>
<exclude>commons-cli:commons-cli</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>org.apache.flink:*</artifact>
<excludes>
<!-- exclude shaded google but include shaded curator -->
<exclude>org/apache/flink/shaded/com/**</exclude>
<exclude>web-docs/**</exclude>
</excludes>
</filter>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.ibm.cloud.flink.StreamingJob</mainClass>
</transformer>
</transformers>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
</execution>
</executions>
</plugin>
我正在使用 S3 hadoop Flink fs 库:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-s3-fs-hadoop</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.2.5</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.2.5</version>
</dependency>
我的项目在github上:https://github.com/ibm-cloud-streaming-retail-demo/flink-on-iae-messagehub-to-s3
我认为另一组罐子正在独立运行。 以前有人见过这个问题吗?
首先,您可以使用以下命令生成 Flink 快速入门项目:
mvn archetype:generate
-DarchetypeGroupId=org.apache.flink
-DarchetypeArtifactId=flink-quickstart-java
-DarchetypeVersion=1.4.0
这比从现有项目中复制pom.xml
文件更干净、更安全。
据我在Github上看到的,你的工作使用BucketingSink。截至Flink时 1.4.0,这个接收器直接依赖于Hadoop的文件系统类[1]。 但是,您将flink-s3-fs-hadoop
作为依赖项包含在内;这是一个 Flink 具有着色的 Hadoop 依赖项的文件系统,不适用于当前的 BucketingSink 实现。
您看到的异常提示了类加载问题 [2]。我怀疑 你的用户jar中有Hadoop类,与那些冲突 来自弗林克。您可以尝试的是删除所有Hadoop依赖项 pom.xml,或者在flink-conf.yaml
中您可以尝试设置classloader.resolve-order: parent-first
[3]。
[1] https://github.com/apache/flink/blob/release-1.4/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L1118
[2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#x-cannot-be-cast-to-x-exceptions
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#configuring-classloader-resolution-order