当模式存在时匹配字符串，但以它开头时除外

我想去掉另一个单词后面的空格、括号和字符。例如

你好(嗨(->你好
你好(嗨( -> 你好
你好 (嗨( 你好 -> 你好
(嗨( 你好 Bonjour -> (hi( 你好 bonjour
-> (嗨(_hello

我已经成功地去除了空格和括号，但是当它位于单词的开头时，我无法阻止它。

re.sub("s*(.+", "", "hello(hi)")      # 'hello'
re.sub("s*(.+", "", "(hi)_hello")     # '', NOT desirable
re.sub("w+s*(.+", "", "hello(hi)")   # '', NOT desirable
re.sub("w+s*(.+", "", "(hi)_hello")  # '(hi)_hello'

我也查了一些关于负面展望的文件，但到目前为止还无法得到。

任何帮助将不胜感激。

您可以使用带有负面回溯的正则表达式。

cases = [
    'hello (hi)', 
    'hello(hi)', 
    'hello (hi) bonjour', 
    '(hi) hello bonjour', 
    '(hi)_hello'
]

>>> [re.sub(r'(?<!^)s*(.*', '', i) for i in cases]
['hello', 'hello', 'hello', '(hi) hello bonjour', '(hi)_hello']

详

(?<!   # negative lookbehind
^      # (do not) match the start of line
)     
s*    # 0 or more spaces
(     # literal parenthesis
.*     # match 0 or more characters (greedy)

你需要一个负面的回头看：(?<!^) .(?<!...)是负面的回望。这意味着如果您在比赛的其余部分之前看到...，则不匹配。

在这种情况下，您不想在案例开始时匹配，因此您的...将^

。即：

re.sub("(?<!^)s*(.+", "", "(hi)_hello") # (hi_hello)

如果行首和第一个括号之间只有空格，它仍会替换文本：

re.sub("(?<!^)s*(.+", "", "  (hi)_hello") # ' '

我不知道

你是否必须使用正则表达式，但因为你使用 Python 它也可以这样完成：

lines = ["(hi) hello", "hello (hi)", "hello (hi) hello"]
for line in lines:
    result = line.split("(hi)")
    if(result[0] == ""):
        print(line)
    else:
        print(result[0])

相关内容

最新更新

热门标签：