GeoSpark使用在Azure上运行Databricks的Maven UDF



我正在尝试Databricks@Azure带火花组:Spark 3.0.0、Scala 2.12在我安装的群集(!(上:地质公园:1.3.1地质公园-sql_2.3:1.3.1灵感来自https://databricks.com/notebooks/geospark-notebook.html我喜欢SQL,并且希望运行GeoSpark查询。

我运行这个(从笔记本(:

%scala
import com.vividsolutions.jts.geom.{Coordinate, Geometry, GeometryFactory}
import org.datasyslab.geospark.formatMapper.shapefileParser.ShapefileReader
import org.datasyslab.geospark.spatialRDD.SpatialRDD
import org.datasyslab.geosparksql.utils.{Adapter, GeoSparkSQLRegistrator}
GeoSparkSQLRegistrator.registerAll(sqlContext)

当我运行此检查时:

%scala 
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL UDF scalar example")
.getOrCreate()

spark.catalog.listFunctions().filter("name like 'ST%P%' ").show(false)

/** spark.catalog.listTables().show() 
spark.sql("SELECT ST_Point(0,0) FROM ( VALUES (42) ) AS t(a); ").show() */

输出为:

|name                       |database|description|className                                                                |isTemporary|
|ST_NPoints                 |null    |null       |org.apache.spark.sql.geosparksql.expressions.ST_NPoints$                 |true       |
|ST_Point                   |null    |null       |org.apache.spark.sql.geosparksql.expressions.ST_Point$                   |true       |
...

但是这个

%sql
SELECT t.a, ST_Point(0,0) as p
FROM (VALUES (42)) AS t(a);

失败:

Error in SQL statement: NoClassDefFoundError: org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback$class

我做错了什么?

p.S。我也试过:

CREATE FUNCTION ST_Point AS 'org.apache.spark.sql.geosparksql.expressions.ST_Point$';

带和不带结束美元符号。create函数语句返回OK;然而,运行包括ST_point的选择,然后返回:

Error in SQL statement: AnalysisException: No handler for UDF/UDAF/UDTF 'org.apache.spark.sql.geosparksql.expressions.ST_Point$'; line 1 pos 12

geospark 1.3.1似乎是为Spark 2.x构建的,请参阅[1],如果您需要使用Spark 3.x,请尝试升级到geospark 1.3.2,否则请尝试降级到Spark 2.x.

[1]http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/

相关内容

  • 没有找到相关文章

最新更新