我在我的pyspark数据框架中有一个jsonstring列,并试图摄取到cosmos DB。由于字符串类型的原因,jsonstring用""宇宙db。什么是最好的方式来转换这个列类型从字符串到json对象?
使用jsonstring -
[{"Date"11/13/2020 3:23:21 PM","Ids":"[]"、"col2":"abc","col3":","value3":"[]"、"currency":","status":"Active","tag":"[]"、"Info":"[]"}]
迁移cosmos db后,该值变为
[{"Date" "11/13/2020 3:23:21 "下午,"id ":"[]","col2 ":"abc ","col3 ":"","value3 ":"[]","currency": "","status": "活跃","tag": "[]","Info ":"[]"}]
谢谢你的帮助!
如果你想创建json对象则使用collect_list
+create_map
+to_json
函数。
下面是参考示例:
Create JSON object:
df.agg(collect_list(create_map(lit("product"),"product",lit("cost"),"cost")).alias("stru")).
selectExpr("to_json(stru) as json").
show(10,False)
#+-------------------------------------------------------------------------------------------------------------------------------+
#|json |
#+-------------------------------------------------------------------------------------------------------------------------------+
#|[{"product":"pen","cost":"10"},{"product":"book","cost":"40"},{"product":"bottle","cost":"80"},{"product":"glass","cost":"55"}]|
#+-------------------------------------------------------------------------------------------------------------------------------+
#write to hdfs use .saveAsTextFile
df.agg(collect_list(create_map(lit("product"),"product",lit("cost"),"cost")).alias("stru")).selectExpr("to_json(stru) as json").rdd.map(lambda x:x['json']).saveAsTextFile("<path>")
#cat part-00000
#[{"product":"pen","cost":"10"},{"product":"book","cost":"40"},{"product":"bottle","cost":"80"},{"product":"glass","cost":"55"}]
这是参考SO线程