修改使用正则表达式提取年龄变化


import re
s = '99year old 93yo 100 yo 97y.o. and his wife is 93 y.o. 20 y.o  90old 23 year old 29 years old but not 25-year-old and 91year old cousin is 99 now and 90-year-old or 102 year old'
reg = r'(?:9d|1d{2})(?:s|-)?years?(?:s|-)?old'
r1 = re.findall(reg,s)
r1
['99year old', '91year old', '90-year-old', '102 year old']

以下代码运行良好,取自使用正则表达式提取年龄变化

我的目标是提取r1中列出的元素加上以y.o.yo结尾的任何超过90的数字。我想要的输出是

['99year old', '93yo', '100 yo', '97y.o., '93 y.o.',  '91year old', '90-year-old', '102 year old']

我试图按如下方式更改reg但这并不能安静地工作

reg = r'(?:9d|1d{2})(?:s|-)?years?(?:s|-)?old(?:9d|1d{2})y.o.|(?:9d|1d{2})yo' 

如何更改reg以获得所需的输出?

我猜也许是一些类似于的表达式,

b(?:9d|1d{2})s*-?y(?:ears?)?.?s*-?o(?:ld)?.?b

也许可以调查一下。

测试

import re
regex = r'b(?:9d|1d{2})s*-?y(?:ears?)?.?s*-?o(?:ld)?.?b'
string = '''
99year old 93yo 100 yo 97y.o. and his wife is 93 y.o. 20 y.o  90old 23 year old 29 years old but not 25-year-old and 91year old cousin is 99 now and 90-year-old or 102 year old
'''
print(re.findall(regex, string))

输出

["99岁", "93岁", ">

100岁", "97岁", "93岁", "91岁", "90岁","102岁"]


如果您希望简化/修改/探索表达式,已在 regex101.com 的右上角面板上进行了说明。如果您愿意,还可以在此链接中观看它如何与一些示例输入匹配。


最新更新