小贝子编程

如何使用具有两个值(分开)的列转换表，将其分为两个值，在Pyspark中的两个值中

本文关键字：两个何使用 Pyspark 分开转换 apache-spark pyspark apache-spark-sql
更新时间 : 2023-09-10
英文 : How to Convert the table with column having two values(separate by , ) divide it into two values in two in pyspark?

df.printschema（）

root
 |-- range: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: long (nullable = true)
 |    |    |-- b: string (nullable = true)

df.show（）

+------------+
|    range   |
+------------+
|[[3, Hello]]|
+------------+

我所需的输出

+------------+
|  a  |  b   |
+------------+
|  3  | Hello|
+------------+

用具有两个值（单独的，）将其分为两个

的列将其分为两个值的列转换表。

这是pyspark版本用户deo的scala答案：

import pyspark.sql.functions as F
j = '{"range":[{"a":3,"b":"Hello"}]}'
df = spark.read.json(sc.parallelize([j]))
#This convertes the array column to a struct column
df=df.withColumn('exploded', F.explode('range'))
#and the columns of a struct can easily by selected with * 
df.select('exploded.*').show()

您也可以使用oneliner进行此操作：

df.select(F.explode('range')).select('col.*').show()

如何使用具有两个值(分开)的列转换表，将其分为两个值，在Pyspark中的两个值中

相关内容

最新更新

热门标签：