如何从部分子字符串匹配返回完整子字符串在python作为一个列表?

我有不同长度的字符串，必须检查子字符串是否匹配"tion"， "ex"， "ph"， "ost"， "ast"， " list "忽略大小写和位置，即单词的前缀/后缀/中间。必须在新列表中返回匹配的单词，而不是单独返回匹配的子字符串元素。使用下面的代码，我可以返回一个新的匹配子字符串元素列表，但不包含完整的匹配词。

def latin_ish_words(text):
import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
matches=pattern.findall(text)
return matches
latin_ish_words("This functions as expected")

结果如下:['tion', 'ex']

我想知道如何将整个单词而不是匹配的子字符串元素返回到newlist中?

可以使用

pattern=re.compile(r"w*?(?:tion|ex|ph|ost|ast|ist)w*")
pattern=re.compile(r"[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*")
pattern=re.compile(r"[^Wd_]*?(?:tion|ex|ph|ost|ast|ist)[^Wd_]*")

正则表达式(参见正则表达式演示)匹配

w*?- 0个或多个但尽可能少的字字符
(?:tion|ex|ph|ost|ast|ist)-字符串之一
w*- 0或更多，但尽可能多的字字符

[a-zA-Z]部分将只匹配ASCII字母，而[^Wd_]将匹配所有Unicode字母。

注意使用re.findall的非捕获组，否则，捕获的子字符串也会进入输出列表。

如果您只需要匹配字母单词，并且您需要将它们作为完整的单词进行匹配，请添加单词边界，r"b[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*b"。

参见Python演示:

import re
def latin_ish_words(text):
import re
pattern=re.compile(r"w*?(?:tion|ex|ph|ost|ast|ist)w*")
return pattern.findall(text)

print(latin_ish_words("This functions as expected"))
# => ['functions', 'expected']

忽略大小写

pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
matches=pattern.findall(text)

不这样做，考虑下面的例子

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

[]

尽管应该有EX，你应该添加re.IGNORECASE标志，像这样

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist", re.IGNORECASE)
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

['EX']

对于不区分大小写的空白边界匹配，您可以使用:

(?i)(?<!S)w*(?:tion|ex|ph|[oia]st)w*(?!S)

模式匹配:

(?i)不区分大小写匹配的内联修饰符(或使用re.I)
(?<!S)在左侧断言一个空白边界
w*匹配可选字符
(?:非捕获组
- tion|ex|ph|[oia]st匹配tionexphp或ostistast使用字符类
)关闭非捕获组
w*匹配可选单词字符
(?!S)在右侧断言一个空白边界

Regex demo | Python demo

def latin_ish_words(text):
import re
pattern = r"(?i)(?<!S)w*(?:tion|ex|ph|[oia]st)w*(?!S)"
return re.findall(pattern, text)
print(latin_ish_words("This functions as expected"))

输出

['functions', 'expected']

相关内容

最新更新

热门标签：