使用以下代码,我试图从spark shell连接到HANA,并从特定的表中获取数据:
spark-submit --properties-file /users/xxx/spark-defaults.conf
./spark-shell --properties-file /users/xxx/spark-defaults.conf
val sparksqlContext = new org.apache.spark.sql.SQLContext(sc)
val driver ="com.sap.db.jdbc.Driver"
val url ="jdbc:sap://yyyyyy:12345"
val database= "STAGING"
val username = "uuuuu"
val password = "zzzzzz"
val table_view = "STAGING.Tablename"
val jdbcDF = sparksqlContext.read.format("jdbc").option("driver",driver).option("url",url).option("databaseName", database).option("user", username).option("password",password).option("dbtable", table_view).option("partitionColumn","INSTANCE_ID").option("lowerBound","7418403").option("upperBound","987026473").option("numPartitions","5").load()
jdbcDF.cache
jdbcDF.createOrReplaceTempView("TESTING_hanaCopy")
val results = sparksqlContext.sql("select * from TESTING_hanaCopy")
val resultsCounts = sparksqlContext.sql("select count(*) from TESTING_hanaCopy")
val countsval=results.count()
resultsCounts.show()
错误如下:
scala>resultsCounts.show()org.apache.spark.SparkException:由于阶段失败,作业中止:任务不可序列化:java.io.NotSerializableException:com.sap.db.jdbc.topology.Host序列化堆栈:-对象不可序列化(类:com.sap.db.jdbc.topology.Host,值:yyyyy:12345)-writeObject数据(类:java.util.ArrayList)-对象(类java.util.ArrayList,[yyyyy:12345])-writeObject数据(类:java.util.Hashtable)-对象(类java.util.Properties,{databasename=STAGING,dburl=jdbc:sap://yyyyyy:12345,user=uuuuu,password=zzzzzz,hostlist=[yyyyy:12345]})-字段(类:org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions,名称:asConnectionProperties,类型:类java.util.Properties)-对象(类org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions,org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions@7cd755a1)-字段(类:org.apache.spark.sql.expension.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1,名称:选项$1,类型:类org.apache.sspark.sql.extension.datasources.jdbc.JDBCOptions)
我试图理解这里和这里提供的解决方案,但无法理解在上面的代码中要更改什么
这篇博客文章的注意部分解决了这个问题:
注意:我已经使用最近的SPS12版本的HanaJDBC驱动程序(ngdbc.jar)对Spark进行了测试;SPS12系统和两者似乎都运行良好。旧版本的驱动程序在Spark中出现以下错误:"org.apache.Spark.SparkException:由于阶段失败而中止作业:任务不可序列化:java.io.NotSerializableException:com.sap.db.jdbc.topolog.Host'