使用
我正在尝试转换一个文本文件,它看起来如下:
14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
有更多的日志行。我需要将其转换为一个json对象,如下所示:
{"date_time": "2019-10-14 13:00:19", "url": "www.google.com","type":"click", "user":"root", "ip":"0.0.0.0"}
但我似乎无法在Python中找到一个明显的方法,任何帮助都感谢
您可以使用datetime
和json
模块。打开文件并逐行迭代,您可能需要调整代码的某些部分。
strptime
行为
工作示例:
import datetime
import json
in_text = """14/10/2019 13:00:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:02:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}
14/10/2019 13:05:19 | www.google.com | {"type":"click", "user":"root", "ip":"0.0.0.0"}"""
item_list = []
for line in in_text.split("n"):
date, url, json_part = line.split("|")
item = {
"date_time": datetime.datetime.strptime(date.strip(), "%d/%m/%Y %H:%M:%S"),
"url": url.strip(),
}
item.update(json.loads(json_part))
item_list.append(item)
print(item_list)
从文件中读取行:
with open("your/file/path.txt") as fh:
for line in fh:
# Copy the code from the above example.
...
import json
from ast import literal_eval
def transform_to_json(row):
d = literal_eval(row[2].strip())
d["date_time"] = row[0]
d["url"] = row[1]
return d
with open('example.txt', 'r') as file:
json_objs = [transform_to_json(row.split('|')) for row in file.readlines()]
single_json_result = json.dumps(json_objs)
使用pandas
:
- 如前所述,给定
.txt
文件中的数据 .to_json
有各种参数来定制JSON文件的最终外观- 将数据放在数据帧中具有允许进行额外分析的优点
- 数据存在许多问题,这些问题可以很容易地解决
- 没有列名
- 数据时间格式不正确
- URL周围的空白
import pandas as pd
# read data
df = pd.read_csv('test.txt', sep='|', header=None, converters={2: eval})
# convert column 0 to a datatime format
df[0] = pd.to_datetime(df[0])
# your data has whitespace around the url; remove it
df[1] = df[1].apply(lambda x: x.strip())
# make column 2 a separate dataframe
df2 = pd.DataFrame.from_dict(df[2].to_list())
# merge the two dataframes on the index
df3 = df.merge(df2, left_index=True, right_index=True, how='outer')
# drop old column 2
df3.drop(columns=[2], inplace=True)
# name column 0 and 1
df3.rename(columns={0: 'date_time', 1: 'url'}, inplace=True)
# dataframe view
date_time url type user ip
2019-10-14 13:00:19 www.google.com click root 0.0.0.0
2019-10-14 13:02:19 www.google.com click root 0.0.0.0
2019-10-14 13:05:19 www.google.com click root 0.0.0.0
# same to a JSON
df3.to_json('test3.json', orient='records', date_format='iso')
JSON文件
[{
"date_time": "2019-10-14T13:00:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:02:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}, {
"date_time": "2019-10-14T13:05:19.000Z",
"url": "www.google.com",
"type": "click",
"user": "root",
"ip": "0.0.0.0"
}
]