我有一个rdd列表,如
{
"name": "adam",
"gender": "male",
"new_column": "white,black,yellow"
}
如何用键值创建新的rdd:
{
"name": "adam",
"gender": "male",
"new_column": "white"
}
{
"name": "adam",
"gender": "male",
"new_column": "black"
}
{
"name": "adam",
"gender": "male",
"new_column": "yellow"
}
谁能给我指路吗
df.printSchema()
root
|-- name: string (nullable = true)
|-- gender: string (nullable = true)
|-- new_column: string (nullable = true)
从Spark 1.5开始,您可以使用split
和explode
函数如下:
from pyspark.sql import functions as F
df.withColumn("new_column", F.explode(F.split("new_column", ",")))
您可以在pyspark函数文档
中找到您可以在pyspark中使用的所有函数