将带有部分json对象的文本文件转换为json文件

我正在尝试转换一个文本文件，它看起来如下：

14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}

有更多的日志行。我需要将其转换为一个json对象，如下所示：

{"date_time": "2019-10-14 13:00:19", "url": "www.google.com","type":"click", "user":"root", "ip":"0.0.0.0"}

但我似乎无法在Python中找到一个明显的方法，任何帮助都感谢

您可以使用datetime和json模块。打开文件并逐行迭代，您可能需要调整代码的某些部分。

strptime行为

工作示例：

import datetime
import json
in_text = """14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}"""
item_list = []
for line in in_text.split("n"):
date, url, json_part = line.split("|")
item = {
"date_time": datetime.datetime.strptime(date.strip(), "%d/%m/%Y %H:%M:%S"),
"url": url.strip(),
}
item.update(json.loads(json_part))
item_list.append(item)
print(item_list)

从文件中读取行：

with open("your/file/path.txt") as fh:
for line in fh:
# Copy the code from the above example.
...

import json
from ast import literal_eval
def transform_to_json(row):
d = literal_eval(row[2].strip())
d["date_time"] = row[0]
d["url"] = row[1]
return d

with open('example.txt', 'r') as file:
json_objs = [transform_to_json(row.split('|')) for row in file.readlines()]
single_json_result = json.dumps(json_objs)

使用`pandas`：

如前所述，给定.txt文件中的数据
.to_json有各种参数来定制JSON文件的最终外观
将数据放在数据帧中具有允许进行额外分析的优点
数据存在许多问题，这些问题可以很容易地解决
- 没有列名
- 数据时间格式不正确
- URL周围的空白

import pandas as pd
# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})
# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])
# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())
# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())
# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')
# drop old column 2
df3.drop(columns=[2], inplace=True)
# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)
# dataframe view
date_time               url   type  user       ip
2019-10-14 13:00:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:02:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:05:19   www.google.com   click  root  0.0.0.0
# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')

JSON文件

[{
"date_time": "2019-10-14T13:00:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:02:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:05:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}
]

使用`pandas`：

JSON文件

相关内容

最新更新

热门标签：

将带有部分json对象的文本文件转换为json文件

使用pandas：

JSON文件

相关内容

最新更新

热门标签：

使用`pandas`：