我正在尝试创建一个简单的Spark结构化流应用程序,我需要从Kafka读取流。但是,当我运行以下代码时:
df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe, "mytopic")
.load()
然后我得到以下错误:
AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".
因此,根据结构化流+ Kafka集成指南,我需要运行以下命令:
./bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ...
这给了我下面的错误,我不明白:
Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/home/myname/spark-3.1.2-bin-hadoop3.2/... does not exist' Please specify one with --class.
注意:当执行此命令时,我在我的spark-3.1.2-bin-hadoop3.2
文件夹中。
根据结构化流+ Kafka集成指南,我需要运行以下命令:
...
不是文字。您需要提供命令的其余部分,其中包括--class
https://spark.apache.org/docs/latest/submitting-applications.html