将带有部分json对象的文本文件转换为json文件



我正在尝试转换一个文本文件,它看起来如下:

14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}

有更多的日志行。我需要将其转换为一个json对象,如下所示:

{"date_time": "2019-10-14 13:00:19", "url": "www.google.com","type":"click", "user":"root", "ip":"0.0.0.0"}

但我似乎无法在Python中找到一个明显的方法,任何帮助都感谢

您可以使用datetimejson模块。打开文件并逐行迭代,您可能需要调整代码的某些部分。

strptime行为

工作示例:

import datetime
import json
in_text = """14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}"""
item_list = []
for line in in_text.split("n"):
date, url, json_part = line.split("|")
item = {
"date_time": datetime.datetime.strptime(date.strip(), "%d/%m/%Y %H:%M:%S"),
"url": url.strip(),
}
item.update(json.loads(json_part))
item_list.append(item)
print(item_list)

从文件中读取行:

with open("your/file/path.txt") as fh:
for line in fh:
# Copy the code from the above example.
...
import json
from ast import literal_eval
def transform_to_json(row):
d = literal_eval(row[2].strip())
d["date_time"] = row[0]
d["url"] = row[1]
return d

with open('example.txt', 'r') as file:
json_objs = [transform_to_json(row.split('|')) for row in file.readlines()]
single_json_result = json.dumps(json_objs)

使用pandas

  • 如前所述,给定.txt文件中的数据
  • .to_json有各种参数来定制JSON文件的最终外观
  • 将数据放在数据帧中具有允许进行额外分析的优点
  • 数据存在许多问题,这些问题可以很容易地解决
    • 没有列名
    • 数据时间格式不正确
    • URL周围的空白
import pandas as pd
# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})
# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])
# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())
# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())
# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')
# drop old column 2
df3.drop(columns=[2], inplace=True)
# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)
# dataframe view
date_time               url   type  user       ip
2019-10-14 13:00:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:02:19   www.google.com   click  root  0.0.0.0
2019-10-14 13:05:19   www.google.com   click  root  0.0.0.0
# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')

JSON文件

[{
"date_time": "2019-10-14T13:00:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:02:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:05:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}
]

最新更新