SparkSQL 缺少要求注册表时出错



我是Scala和Apache Spark的新手,我正在尝试使用Spark SQL。克隆存储库后,我通过键入 bin/spark-shell 启动了 Spark shell,然后运行以下命令:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
val pathUsers = "users.txt"
case class User(uid: String, name: String, surname: String)
val users = sc.textFile(pathUsers).map(_.split(" ")).map(u => User(u(0), u(1), u(2)))
users.registerTempTable("users")
val res = sqlContext.sql("SELECT * FROM users")
res.collect().foreach(println)

一切都按预期工作。users.txt文件如下所示:

uid-1 name1 surname1
uid-2 name2 surname2
...

之后,我尝试创建一个独立的项目,并使用sbt构建了依赖项。build.sbt中列出的依赖项如下:

"org.apache.spark" % "spark-streaming_2.10" % "1.2.0",
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.2.0",
"org.apache.spark" % "spark-sql_2.10" % "1.2.0",
"org.apache.spark" % "spark-catalyst_2.10" % "1.2.0"

如果我运行相同的指令,它会在此行崩溃:

users.registerTempTable("users")

出现此错误:

scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with java.net.URLClassLoader@56352b57 of type class java.net.URLClassLoader with classpath [file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/jansi.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/jline.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-compiler.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-library.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-reflect.jar] and parent being xsbt.boot.BootFilteredLoader@599e80b1 of type class xsbt.boot.BootFilteredLoader with classpath [<unknown>] and parent being sun.misc.Launcher$AppClassLoader@76d4d81 of type class sun.misc.Launcher$AppClassLoader with classpath [file:/usr/local/Cellar/sbt/0.13.5/libexec/sbt-launch.jar] and parent being sun.misc.Launcher$ExtClassLoader@18fb53f6 of type class sun.misc.Launcher$ExtClassLoader with classpath [file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/dnsns.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/localedata.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunec.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/zipfs.jar,file:/System/Library/Java/Extensions/MRJToolkit.jar] and parent being primordial classloader with boot classpath [/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/JObjC.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/classes] not found.
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
at scala.reflect.api.Universe.typeOf(Universe.scala:59)
at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
at org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
at .<init>(<console>:20)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904)
at xsbt.ConsoleInterface.run(ConsoleInterface.scala:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102)
at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:77)
at sbt.Console.sbt$Console$$console0$1(Console.scala:23)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:24)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24)
at sbt.Logger$$anon$4.apply(Logger.scala:90)
at sbt.TrapExit$App.run(TrapExit.scala:244)
at java.lang.Thread.run(Thread.java:744)

问题出在哪里?

更新:

好的,我不认为问题出在Spark SQL,而是Spark本身,因为我甚至无法执行users.collect()。相反,如果它在火花壳中运行,结果是:

res5: Array[User] = Array(User(uid-1,name1,surname1), User(uid-2,name2,surname2))

不出所料。错误如下:

15/01/08 09:47:02 INFO FileInputFormat: Total input paths to process : 1
15/01/08 09:47:02 INFO SparkContext: Starting job: collect at <console>:19
15/01/08 09:47:02 INFO DAGScheduler: Got job 0 (collect at <console>:19) with 2 output partitions (allowLocal=false)
15/01/08 09:47:02 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:19)
15/01/08 09:47:02 INFO DAGScheduler: Parents of final stage: List()
15/01/08 09:47:02 INFO DAGScheduler: Missing parents: List()
15/01/08 09:47:02 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at <console>:17), which has no missing parents
15/01/08 09:47:02 INFO MemoryStore: ensureFreeSpace(2840) called with curMem=157187, maxMem=556038881
15/01/08 09:47:02 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 530.1 MB)
15/01/08 09:47:02 INFO MemoryStore: ensureFreeSpace(2002) called with curMem=160027, maxMem=556038881
15/01/08 09:47:02 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2002.0 B, free 530.1 MB)
15/01/08 09:47:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.195:63917 (size: 2002.0 B, free: 530.3 MB)
15/01/08 09:47:02 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/08 09:47:02 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/01/08 09:47:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[3] at map at <console>:17)
15/01/08 09:47:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/01/08 09:47:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 09:47:02 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 09:47:02 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 192.168.100.195): java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

以编程方式提交作业时,我还在 Spark EC2 集群上发现了这个 java.io.EOFException,但我不知道可能需要哪个版本的hadoop-client

可以通过向 sbt 项目设置添加fork := true来解决此问题。

看:http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-2-0-MissingRequirementError-td10123.html

在引用的项目文件中找到其他有用的设置:

https://github.com/deanwampler/spark-workshop/blob/master/project/Build.scala

尝试添加"org.apache.spark" % "spark-catalyst_2.10" % "1.2.0"(尽管我觉得这应该作为依赖项引入)。

相关内容

  • 没有找到相关文章

最新更新