使用regex条件的列表推导

我有一个字符串列表。如果这些字符串中的任何一个有4位数字的年份，我想在年份结束时截断字符串。否则，我将保留字符串。

我试着使用:

    for x in my_strings:   
      m=re.search("DddddD",x)  
      if m: x=x[:m.end()]

我也试过:

my_strings=[x[:re.search("DddddD",x).end()] if re.search("DddddD",x) for x in my_strings]

这两个都不起作用。

你能告诉我我做错了什么吗?

这样的东西似乎可以处理琐碎的数据:

>>> regex = re.compile(r'^(.*(?<=D)d{4}(?=D))(.*)')                         
>>> strings = ['foo', 'bar', 'baz', 'foo 1999', 'foo 1999 never see this', 'bar 2010 n 2015', 'bar 20156 see this']
>>> [regex.sub(r'1', s) for s in strings]
['foo', 'bar', 'baz', 'foo 1999', 'foo 1999', 'bar 2010', 'bar 20156 see this']

看起来你对结果字符串的唯一绑定是在end()，所以你应该使用re.match()代替，并修改你的正则表达式为:

my_expr = r".*?Dd{4}D"

然后，在你的代码中:

regex = re.compile(my_expr)
my_new_strings = []
for string in my_strings:
    match = regex.match(string)
    if match:
        my_new_strings.append(match.group())
    else:
        my_new_strings.append(string)

或作为列表推导式:

regex = re.compile(my_expr)
matches = ((regex.match(string), string) for string in my_strings)
my_new_strings = [match.group() if match else string for match, string in matches]

或者，您可以使用re.sub:

regex = re.compile(r'(Dd{4})D')
new_strings = [regex.sub(r'1', string) for string in my_strings]

我不完全确定您的用例，但是下面的代码可以给您一些提示:

import re
my_strings = ['abcd', 'ab12cd34', 'ab1234', 'ab1234cd', '1234cd', '123cd1234cd']
for index, string in enumerate(my_strings):
    match = re.search('d{4}', string)
    if match:
        my_strings[index] = string[0:match.end()]
print my_strings
# ['abcd', 'ab12cd34', 'ab1234', 'ab1234', '1234', '123cd1234']

您实际上非常接近列表推导式，但是您的语法是关闭的-您需要使第一个表达式成为"条件表达式"，即x if <boolean> else y:

[x[:re.search("DddddD",x).end()] if re.search("DddddD",x) else x for x in my_strings]

显然这很难看/很难读。有几种更好的方法将字符串分成4位数的年份。如:

[re.split(r'(?<=Dd{4})D', x)[0] for x in my_strings]

相关内容

最新更新

热门标签：