用于转换UTM错误的PySpark UDF在构造ClassDict(对于numpy.dtype)时需要零个参数



我正试图在PySpark中创建一个UDF,用于将UTM转换为经度和纬度。

错误

Caused by: net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)

尝试过不同的数据类型,但没有成功。

PySpark代码

import pyspark.sql.functions as F
from pyspark.sql.types import *
import utm
df2 = spark.createDataFrame([(531086, 6224626), (531086, 6224626)], ["C1", "C2"])
df2.printSchema()
utm_udf_x = F.udf(lambda x,y: utm.to_latlon(x,y, 32, 'U')[0], ArrayType(FloatType()))
utm_udf_y = F.udf(lambda x,y: utm.to_latlon(x,y, 32, 'U')[1], ArrayType(FloatType()))
df2 = df2.withColumn('lat',utm_udf_x(F.col('C1'), F.col('C2')))
df2 = df2.withColumn('lon',utm_udf_y(F.col('C1'), F.col('C2')))
display(df2)

感谢

主要问题是将Numpy DType转换为浮点形式utm.to_latlon.

这是工作

import pyspark.sql.functions as F
from pyspark.sql.types import *
import utm
df2 = spark.createDataFrame([(340000.0, 5710000.0), (573014.00000135, 6221529.99974406)], ["C1", "C2"])
df2.printSchema()
utm_udf_x = F.udf(lambda x,y: float(utm.to_latlon(x,y, 32, 'U')[0]), FloatType())
utm_udf_y = F.udf(lambda x,y: float(utm.to_latlon(x,y, 32, 'U')[1]), FloatType())
df2 = df2.withColumn('lat',utm_udf_x(F.col('C1'), F.col('C2')))
df2 = df2.withColumn('lon',utm_udf_y(F.col('C1'), F.col('C2')))
display(df2)

最新更新