如何改进我的正则表达式解析模式?



我想更改pattern,以便它不仅匹配具有单位和金额的字符串,而且单独匹配单位。例如,我希望它匹配"立方体"也一样,尽管它没有列出金额。类似地,如果字符串只有数量而没有单位,我希望它也只匹配数量。当前,返回的输出是

['1.0', '0.07', '32.0', '0.12', '1.01', 'cubes', '2']

我希望输出如下:

['1.0', '0.07', '32.0', '0.12', '1.01', '1.0', '2.0']

代码如下:

list_of_texts = ["1oz", "2ml", "4cup", "1 wedge","2 slices", "cubes", "2"]
pattern = r"(^[d -/]+)(oz|ml|cl|tsp|teaspoon|teaspoons|tea spoon|tbsp|tablespoon|tablespoons|table spoon|cup|cups|qt|quart|quarts|drop|drop|shot|shots|cube|cubes|dash|dashes|l|L|liters|Liters|wedge|wedges|pint|pints|slice|slices|twist of|top up|small bottle)"

new_list = []
for text in list_of_texts:
re_result = re.search(pattern, text)
if re_result:
amount = re_result.group(1).strip()
unit = re_result.group(2).strip()
print(amount)
print(unit)
if "-" in amount:
ranged = True
else:
ranged = False
amount = re.sub(r"(d) (/d)",r"12",amount) 
amount = amount.replace("-","+").replace(" ","+").strip()
amount = re.sub(r"[+]+","+",amount)
amount_in_dec = frac_to_dec_converter(amount.split("+"))
amount = np.sum(amount_in_dec)
if ranged:
to_oz = (amount*liquid_units[unit])/2
else:
to_oz = amount*liquid_units[unit]
new_list.append(str(round(to_oz,2)))
else:
new_list.append(text)

注意:我有一个字典,有转换单位

使用*代替+使数字可选。然后,如果第一个捕获组为空,则将其视为1.0

pattern = r"(^[d -/]*)(oz|ml|cl|tsp|teaspoon|teaspoons|tea spoon|tbsp|tablespoon|tablespoons|table spoon|cup|cups|qt|quart|quarts|drop|drop|shot|shots|cube|cubes|dash|dashes|l|L|liters|Liters|wedge|wedges|pint|pints|slice|slices|twist of|top up|small bottle)"
for text in list_of_texts:
re_result = re.search(pattern, text)
if re_result:
amount = re_result.group(1).strip()
if amount == '':
amount = '1.0'
unit = re_result.group(2).strip()
print(amount)
print(unit)
# rest of your code

最新更新