找到字符串索引,然后反向查找正则表达式并删除



我有类似的问题,就像之前在Python Reverse Find in String中发布的那样。

这是我很长的字符串的示例:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''

更新日期: 1/02/2020

在放入数据帧之前,我正在将数据分组到列表中。我不需要任何与'incomplete n/a'相关的数据 我是否需要删除字符串,或者是否有正则表达式函数来识别'incomplete n/a'并对其位置进行分组?

我想要两个输出:

一个这个列表t1L = ['1281674 ', '1281640 ', '1276160 '].请注意,这不包括1331626

TWO要拆分或重新定义的字符串(不包含 1331626(,例如:

t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending'''

感谢您的任何帮助。

我认为有针对您的问题的工作代码new_str = t1[:t1.find(re.findall('d{7}', t1[:t1.find('incomplete n/a')])[-1])])

您需要 2 个正则表达式才能获得 2 个列表:

import re
t1 = '''1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019'''
clean = re.sub(r'bd{7}b(?=(?:(?!bd{7}b).)*incomplete n/a).*?$', '', t1)
print clean
res = re.findall(r'(bd{7}b)', clean)
print res

输出:

1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 
['1281674', '1281640', '1276160']

演示和解释

您可以使用循环和条件尝试使用以下代码。

import re
t1 = '1281674 the crescent annandale 02/10/2019 16/10/2019 - 16/11/2019 pending 1281640 city west link rd lilyfield 02/10/2019 16/10/2019 - 16/11/2019 pending 1276160 victoria rd rozelle 25/09/2019 14/10/2019 - 15/10/2019 pending 1331626 31/12/2019 - 31/01/2020 incomplete n/a 1314832 '
result = None
for t in t1.split(" "):
if re.match("d{7}",t):
result = t
if 'incomplete' in t:
break
print(result)

最新更新