我正试图在PySpark中创建一个UDF,用于将UTM转换为经度和纬度。
错误
Caused by: net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)
尝试过不同的数据类型,但没有成功。
PySpark代码
import pyspark.sql.functions as F
from pyspark.sql.types import *
import utm
df2 = spark.createDataFrame([(531086, 6224626), (531086, 6224626)], ["C1", "C2"])
df2.printSchema()
utm_udf_x = F.udf(lambda x,y: utm.to_latlon(x,y, 32, 'U')[0], ArrayType(FloatType()))
utm_udf_y = F.udf(lambda x,y: utm.to_latlon(x,y, 32, 'U')[1], ArrayType(FloatType()))
df2 = df2.withColumn('lat',utm_udf_x(F.col('C1'), F.col('C2')))
df2 = df2.withColumn('lon',utm_udf_y(F.col('C1'), F.col('C2')))
display(df2)
感谢
主要问题是将Numpy DType转换为浮点形式utm.to_latlon.
这是工作
import pyspark.sql.functions as F
from pyspark.sql.types import *
import utm
df2 = spark.createDataFrame([(340000.0, 5710000.0), (573014.00000135, 6221529.99974406)], ["C1", "C2"])
df2.printSchema()
utm_udf_x = F.udf(lambda x,y: float(utm.to_latlon(x,y, 32, 'U')[0]), FloatType())
utm_udf_y = F.udf(lambda x,y: float(utm.to_latlon(x,y, 32, 'U')[1]), FloatType())
df2 = df2.withColumn('lat',utm_udf_x(F.col('C1'), F.col('C2')))
df2 = df2.withColumn('lon',utm_udf_y(F.col('C1'), F.col('C2')))
display(df2)