Python:删除所有以大写字母开头的单词,并在标点符号后没有出现



我想使用正则言论从文本中删除以大写字母开头的所有单词,并满足这两个条件:

1)仅次于较低的案例字母或" s"(所有格)或标点符号(。,?!)。

2)他们不会追随"。","!"one_answers"?"

我尝试了

import re
myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel's car is white'
regex='([A-Z][a-z']*)(s[A-Z][a-z']*)*'
txt = re.sub(regex, " ", myString)        

我得到

name of her company is    123    !   was going to meet  .  ?   is her boy friend.  ?   daughter of  !  ,   car is white

我想要

name of her company is  WC 123 WaTerCompany! She was going to meet . Why? Because is her boy friend. Patricia? The daughter of ! Look, car is white

要删除整个单词,您想使用b边界锚,以使您不匹配部分单词。要删除标点符号之前的单词,您可以使用负外观,提供的 ,标点符号和第一个字母之间总是有固定量的whitespace。

我将假设标点符号和下一个字母之间总是有一个空间。您始终可以通过用一个空间替换多个空间来首先将输入标准化。

使得正则删除这些词:

b(?<![!?.]s)[A-Z][a-z]*(?:'s)?b

和一个演示:

>>> import re
>>> myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel's car is white'
>>> regex = r'b(?<![!?.]s)[A-Z][a-z]*(?:'s)?b'
>>> re.sub(regex, " ", myString)
'  name of her company is     WC 123 WaTerCompany! She was going to meet  . Why? Because   is her boy friend. Patricia? The daughter of  ! Look,   car is white'

或在线尝试该模式,请在Regex101。

最新更新