我正在使用CDH 5.7.0和PySpark。 当我运行诸如RDD.count()之类的操作时,它显示错误:D id找不到具有类com.mysql.jdbc.Driver的注册驱动程序
以下是步骤
pyspark --driver-class-path/usr/share/java/mysql-connector-java.jar(每个节点上的/usr/share/java/mysql-connector-java.jar)
>>>url ="jdbc:mysql://host/spark?user=root&password=test"
>>> stock_data=sqlContext.read.format("jdbc").option("url",url).option("dbtable","StockPrices").load()
>>> stock_data.printSchema()
root
|-- date: string (nullable = true)
|-- open: double (nullable = true)
|-- high: double (nullable = true)
|-- low: double (nullable = true)
|-- close: double (nullable = true)
|-- volume: long (nullable = true)
|-- adjclose: double (nullable = true)
>>> stock_data.count()
......
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
**Caused by: java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver**
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2$$anonfun$3.apply(JdbcUtils.scala:58)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:57)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:339)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
终于找到...你需要在conf下,有一个default.conf文件,添加spark.executor.extraClassPath mysql.jar,这样执行器就可以找到驱动了。
当我使用以下链接时,这是失败的:
--driver-class-path /usr/share/java/mysql-connector-java.jar
但是当我使用完整的文件路径时,这没关系,例如:
--driver-class-path /usr/share/java/mysql-connector-java-5.1.28.jar