如何使regex-dict与值的总和一起使用,而不是覆盖它们



我是python新手。我有一个日志文件,内容如下:

[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000

我需要做一个dict,将一名员工作为一把钥匙和他的价格总和,如{"Oleg":1250}

我用这个代码制作了员工的lis:


with open ("17082022.log", "r") as file:
text = file.read()
emp_list = set(re.findall(r'employee:(.*)/', text))

这使价格清单


output_pluses = re.findall(r"(?<=price:)[+-]?d+", text)

您可以将re.findall与捕获组一起使用,一步即可获取员工姓名和价格。接下来,创建一个字典:

import re
log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
out = {}
for employee, price in re.findall(r"employee:([^/]+)/price:(d+)", log):
out[employee] = out.get(employee, 0) + int(price)
print(out)

打印:

{'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}

另一个选项是使用.split()函数。这样做的好处是不需要导入re模块,也不需要使用有关设计正则表达式的高级知识:

log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
dct = {}
for line in log.split('n'):
employee, price = line.split('/employee:')[1].split('/price:')
dct[employee] = dct.get(employee, 0) + int(price)
print(dct) # gives {'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}

dct.get(employee, 0)代码的"trick">是,如果employee还不在字典中,则值0将作为price返回,相当于(dct[employee] if employee in dct else 0),然后是经过多行的if语句的缩短版本。

与正则表达式搜索相比,使用.split()方法的另一个优点是,如果日志文件中的行具有意外的格式或内容,则很可能会导致带有错误消息的通知,而正则表达式搜索方法只会传递(错误的(结果。

对于超大的日志文件,正则表达式搜索方法的运行速度快约10%,但对于小的日志文件来说,加载re模块所需的时间与.split()方法相比要慢得多。

最新更新