我对Pyspark很陌生,这是我想做的,下面是表格,类型是ArrayType(DoubleType), ArrayType(DecimalType)
<表类>
B
tbody><<tr>[1, 2] (2、4) (1、2、4) (1, 3, 3) 表类>
您可以使用pandas_udf
df = spark.createDataFrame([
([1,2], [2,4]),
([1,2,4], [1,3,3]),
], 'a array<int>, b array<int>')
df.show()
+---------+---------+
|a |b |
+---------+---------+
|[1, 2] |[2, 4] |
|[1, 2, 4]|[1, 3, 3]|
+---------+---------+
创建pandas_udf
列@F.pandas_udf("array<int>")
def func(a, b):
return a * b
df.withColumn('c', func('a', 'b')).show()
+---------+---------+----------+
| a| b| c|
+---------+---------+----------+
| [1, 2]| [2, 4]| [2, 8]|
|[1, 2, 4]|[1, 3, 3]|[1, 6, 12]|
+---------+---------+----------+