如何将数据帧列分析为列

  • 本文关键字:数据帧 python json pyspark
  • 更新时间 :
  • 英文 :


我有一个数据帧,其中有两列包含 json 数据,我想将该 json 数据解析到我的数据帧所在的列中

+------------+---------+--------------------+--------------------+
|   firstname| lastname|    travellerdetails|            bookjson|
+------------+---------+--------------------+--------------------+
|           K|    Gupta|[{FlierNumber:","...|[{origin:DEL","Et...|
|           K|    Gupta|[{FlierNumber:","...|[{origin:DEL","Et...|
|Jana Ranjani|Raghu Raj|[{BaggageTypeRetu...|[{origin:AMD","De...|
+------------+---------+--------------------+--------------------+

有两列有json数据,我想解析该列

The first row of travellerdetails is

""[{""""FlierNumber"""":""""""""","BaggageTypeReturn"""":""""""""","FirstName"""":""""K""""","Title"""":""""1""""","MiddleName"""":""""D""""","LastName"""":""""Gupta""""","MealTypeOnward"""":""""""""","DateOfBirth"""":""""""""","BaggageTypeOnward"""":""""""""","SeatTypeOnward"""":""""""""","MealTypeReturn"""":""""""""","FrequentAirline"""":null","Type"""":""""A""""","SeatTypeReturn"""":""""""""}","{""""FlierNumber"""":""""""""","BaggageTypeReturn"""":""""""""","FirstName"""":""""Sweety""""","Title"""":""""2""""","MiddleName"""":""""""""","LastName"""":""""Gupta""""","MealTypeOnward"""":""""""""","DateOfBirth"""":""""""""","BaggageTypeOnward"""":""""""""","SeatTypeOnward"""":""""""""","MealTypeReturn"""":""""""""","FrequentAirline"""":null","Type"""":""""A""""","SeatTypeReturn"""":""""""""}]""

the first row of bookjson is

""[{""""origin"""":""""DEL""""","EticketFlag"""":""""false""""","flightcode"""":""""251""""","farebasis"""":""""L0IP""""","spicestatus"""":""""Canceled""""","deptime"""":""""07:20""""","codeshare"""":""""""""","ibibopartner"""":""""indigonew""""","productclass"""":""""R""""","duration"""":""""2h 5m""""","ruleno"""":""""4910""""","qtype"""":""""fbs""""","tickettype"""":""""e""""","flightno"""":""""251""""","servicetype"""":""""""""","fareclass"""":""""L""""","faresequence"""":""""1""""","destination"""":""""GAU""""","carrierid"""":""""6E""""","stops"""":""""0""""","state"""":""""New""""","fare"""":{""""adultphf"""":50","adultttf"""":75","adultdf"""":115","totalsurcharge"""":0","indigonewgrossamount"""":10202","adulttotalfare"""":5101","totalcommission"""":0","adultbasefare"""":4150","totalpassengerhandlingfee"""":0","adultudf"""":562","adultpassengerservicefee"""":149","totalpassengerservicefee"""":0","totalothers"""":0","childtotalfare"""":0","totalbasefare"""":8300","totalfare"""":101...

请帮我如何解析列..??

您要查找的是F.from_json()

你会像这样使用它:

from pyspark.sql import functions as F
df = df.withColumn("travellerdetails", F.from_json(F.col("travellerdetails")))
df = df.withColumn("bookjson", F.from_json(F.col("bookjson")))

但是,请注意,您在问题中给出的 JSON 无效,因此会导致null。另请注意,您可以将架构作为第二个参数传递给from_json - 这可能会加快解析速度,并允许您为每个字段指定所需的数据类型。

最新更新