如何转换Jsonstring列在pyspark数据框架到jsonobject?



我在我的pyspark数据框架中有一个jsonstring列,并试图摄取到cosmos DB。由于字符串类型的原因,jsonstring用""宇宙db。什么是最好的方式来转换这个列类型从字符串到json对象?

使用jsonstring -

[{"Date"11/13/2020 3:23:21 PM","Ids":"[]"、"col2":"abc","col3":","value3":"[]"、"currency":","status":"Active","tag":"[]"、"Info":"[]"}]

迁移cosmos db后,该值变为

[{"Date" "11/13/2020 3:23:21 "下午,"id ":"[]","col2 ":"abc ","col3 ":"","value3 ":"[]","currency": "","status": "活跃","tag": "[]","Info ":"[]"}]

谢谢你的帮助!

如果你想创建json对象则使用collect_list+create_map+to_json函数。

下面是参考示例:

Create JSON object:

df.agg(collect_list(create_map(lit("product"),"product",lit("cost"),"cost")).alias("stru")).  
selectExpr("to_json(stru) as json").  
show(10,False)
#+-------------------------------------------------------------------------------------------------------------------------------+  
#|json |  
#+-------------------------------------------------------------------------------------------------------------------------------+  
#|[{"product":"pen","cost":"10"},{"product":"book","cost":"40"},{"product":"bottle","cost":"80"},{"product":"glass","cost":"55"}]|  
#+-------------------------------------------------------------------------------------------------------------------------------+
#write to hdfs use .saveAsTextFile  
df.agg(collect_list(create_map(lit("product"),"product",lit("cost"),"cost")).alias("stru")).selectExpr("to_json(stru) as json").rdd.map(lambda x:x['json']).saveAsTextFile("<path>")
#cat part-00000  
#[{"product":"pen","cost":"10"},{"product":"book","cost":"40"},{"product":"bottle","cost":"80"},{"product":"glass","cost":"55"}]

这是参考SO线程

最新更新