PySpark - create new key



我有一个rdd列表,如

{
"name": "adam",
"gender": "male",
"new_column": "white,black,yellow"
}

如何用键值创建新的rdd:

{
"name": "adam",
"gender": "male",
"new_column": "white"
}
{
"name": "adam",
"gender": "male",
"new_column": "black"
}
{
"name": "adam",
"gender": "male",
"new_column": "yellow"
}

谁能给我指路吗

df.printSchema()
root
|-- name: string (nullable = true)
|-- gender: string (nullable = true)
|-- new_column: string (nullable = true)

从Spark 1.5开始,您可以使用splitexplode函数如下:

from pyspark.sql import functions as F
df.withColumn("new_column", F.explode(F.split("new_column", ",")))

您可以在pyspark函数文档

中找到您可以在pyspark中使用的所有函数

相关内容

最新更新