name1 40.34 21.56 name221.30 67.45name3 32.45 45.44
我有一个名为" df "的数据框架,结构如下
lv2您可能正在寻找array
函数。
from pyspark.sql import functions as F
df = spark.createDataFrame(
[('abb', 'name1', 40.34, 21.56),
('bab', 'name2', 21.30, 67.45),
('bba', 'name3', 32.45, 45.44)],
['ID', 'name', 'lv1', 'lv2'])
df = df.withColumn('new_col', F.array('lv1', 'lv2'))
df.show()
# +---+-----+-----+-----+--------------+
# | ID| name| lv1| lv2| new_col|
# +---+-----+-----+-----+--------------+
# |abb|name1|40.34|21.56|[40.34, 21.56]|
# |bab|name2| 21.3|67.45| [21.3, 67.45]|
# |bba|name3|32.45|45.44|[32.45, 45.44]|
# +---+-----+-----+-----+--------------+