import re
filename = "access.log"
path = ""
with open (path + filename, "r") as logfile:
count = 0
for line in logfile: # Loops through the log file
regex = ('(?:(GET|POST) )(S+)') # Stores the regex
url = re.findall(regex, line) # Uses the findall method and stores it in url variable
print(url[0][1]) # Prints out a list of URLs
这是日志文件
的示例access.log
209.160.24.63 - - [01/Feb/2021:18:22:17]"GET/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953HTTP 1.1"2002550 "http://www.google.com/productid=12wdef"Mozilla/5.0 (Windows NT 6.1;WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5"422
我得到了粗体的URL,但我现在想把它拆分并存储在python的字典中。
由于您已经获得了加粗字符串,因此您可以通过字符串
中出现的第一个空格将其分割s = "GET /product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953"
s.split(" ", 1)
应该返回
['GET', '/product.screen?productId=BS-AG-G09&JSESSIONID=SD0SL6FF7ADFF4953']
您可以在之后对数据进行相应的转换。
import re
filename = "access.log"
dictionary = {}
list_resources = []
count = 0
with open (filename, "r") as logfile:
for line in logfile: # Loops through the log file
regex = ('(?:(GET|POST) )(S+)') # Stores the regex
url = re.findall(regex, line)[0][1] # Uses the findall method and stores it in url variable
list_resources.append(url)
resource = re.split("?", url)[0]
parameters = re.split("?", url)[1]
parameter = re.split("&", parameters)
param_dict = {}
for i in parameter:
key = re.split('=', i)[0]
value = re.split('=', i)[1]
param_dict[key] = value
dictionary[count] = {'resource': resource, 'parameters': param_dict}
count += 1
# print(list_resources)
print(dictionary)
想到我想做什么,拆分URL并将资源和参数存储在字典中。