Regex从日志文件中获取URL并将其存储在Python的字典中

import re
filename = "access.log"
path = ""
with open (path + filename, "r") as logfile:
count = 0
for line in logfile:                            # Loops through the log file
regex = ('(?:(GET|POST) )(S+)')              # Stores the regex
url = re.findall(regex, line)                 # Uses the findall method and stores it in url variable
print(url[0][1])                              # Prints out a list of URLs

这是日志文件

的示例access.log

209.160.24.63 - - [01/Feb/2021:18:22:17]"GET/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953HTTP 1.1"2002550 "http://www.google.com/productid=12wdef"Mozilla/5.0 (Windows NT 6.1;WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5"422

我得到了粗体的URL，但我现在想把它拆分并存储在python的字典中。

由于您已经获得了加粗字符串，因此您可以通过字符串

中出现的第一个空格将其分割

s = "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953"
s.split(" ", 1)

应该返回

['GET', '/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953']

您可以在之后对数据进行相应的转换。

import re
filename = "access.log"
dictionary = {}
list_resources = []
count = 0
with open (filename, "r") as logfile:
for line in logfile:                            # Loops through the log file
regex = ('(?:(GET|POST) )(S+)')              # Stores the regex
url = re.findall(regex, line)[0][1]           # Uses the findall method and stores it in url variable
list_resources.append(url)

resource = re.split("?", url)[0]
parameters = re.split("?", url)[1]
parameter = re.split("&", parameters)
param_dict = {}
for i in parameter:
key = re.split('=', i)[0]
value = re.split('=', i)[1]
param_dict[key] = value
dictionary[count] = {'resource': resource, 'parameters': param_dict}
count += 1
# print(list_resources)
print(dictionary)

想到我想做什么，拆分URL并将资源和参数存储在字典中。

相关内容

最新更新

热门标签：