使用Python删除字符串中连续出现的元音



我有一个像这样的字符串:

"i'm just returning from work. *oeee* all and we can go into some detail *oo*. what is it that happened as far as you're aware *aouu*"

与上面的一些垃圾字符(用'*'标记突出显示)。我所能观察到的就是垃圾字是由一堆元音组合而成的。现在,我需要删除前面和后面都有空格,只有元音的单词(比如oeee, aouu,等等),长度大于等于2的单词。我如何在python中实现这一点?

目前,我构建了一个元组来包含像((" oeee "," "),(" aouu "," "))这样的替换词,并使用replace将其通过for循环发送。但是如果单词是'oeeee',我需要在元组中添加一个新项。一定有更好的办法。

p。S:实际文本中不会有"*"。我把它放在这里是为了高亮显示。

您需要使用re.sub在python中进行正则表达式替换。你应该使用这个正则表达式:

b[aeiou]{2,}b

将匹配一个单词中包含两个或两个以上元音的序列。我们使用b来匹配单词的边界,因此它将匹配字符串的开头和结尾(在您的字符串中,aouu)以及与标点符号相邻的单词(在您的字符串中,oo)。如果你的文本也可能包含大写元音,使用re.I标志来忽略大小写:

import re
text = "i'm just returning from work. oeee all and we can go into some detail oo. what is it that happened as far as you're aware aouu"
print(re.sub(r'b[aeiou]{2,}b', '', text, 0, re.I))

输出
i'm just returning from work.  all and we can go into some detail . what is it that happened as far as you're aware 

最新更新