使用带有客户分隔符的pandas dataframe将文本转换为json



我有一个名为sample.txt的文本文件,该文本文件包含如下数据

0: 480x640 2 persons, 1 tv, 1: 480x640 5 persons,  2 tvs, 1 oven, Done. (0.759s) Mon, 04 April 11:39:48 status : Low
0: 480x640 2 persons, 1 tv, 1: 480x640 4 persons, 3 chairs,  1 oven, Done. (0.763s) Mon, 04 April 11:39:50 status : High

这类数据在示例文本中。

我试过这个代码将文本文件转换成json格式

cam_details =  pd.read_csv('sample.txt', sep=r'(?:,s*|^)(?:d+: d+xd+|Done[^)]+)s*)',
header=None, engine='python', names=(None, 'a', 'b', 'date', 'status')).iloc[:, 1:]

cam_details.to_json('output.json', orient = "records", date_format = "epoch", double_precision = 10, 
force_ascii = True, date_unit = "ms", default_handler = None)

我已经尝试过这个代码,但我没有得到正确的格式json。现在如何使用pandas数据框架分隔符将文本转换为上面提到的json格式。

我得到这样的输出

{
"a": " 2 persons, 1 tv, 1 laptop, 1 clock",
"b": " 4 persons, 1 car, 1 bottle, 3 chairs, 2 tvs, 1 oven",
"date": "Mon, 04 April 11:39:51 status : Low"
}

现在我希望把它转换成json文件,像这样

[
{
"a": " 2 persons, 1 tv, 1 laptop, 1 clock",
"b": " 5 persons, 1 bottle, 3 chairs, 2 tvs, 1 cell phone, 1 oven",
"date": "Mon, 04 April 11:39:48" , 
"status": "Low"
},
{
"a": " 2 persons, 1 tv, 1 laptop, 2 clocks",
"b": " 4 persons, 1 car, 3 chairs, 2 tvs, 1 laptop, 1 oven",
"date": "Mon, 04 April 11:39:50",
"status": "Low"
} ]

似乎问题是status部分没有被分隔符分隔。您可以通过在pandas中添加一些处理来解决这个问题,在写入json:

之前拆分status关键字上的日期列并去掉冒号。
# Splits the date part and the status part into two columns (your status is being dragged into the date column)
cam_details[['date', 'status']] = cam_details['date'].map(lambda x: x.split('status')).tolist()
# Clean up the status column which still has the colons and extra whitespaces
cam_details['status'] = cam_details['status'].map(lambda x: x.replace(':', '').strip())

相关内容

最新更新