用反斜杠替换所有内容，直到下一个空格

作为预处理数据的一部分，我希望能够替换任何带有斜杠的内容，直到出现空字符串的空格。例如，fs24需要替换为空或qc23424替换为空。我想删除的带有斜杠的标签可能会多次出现。我创建了一个"要根除的标签"列表，我的目标是在正则表达式中使用它来清理提取的文本。

输入字符串：This is a string fs24 and it contains some texts and tags qc23424. which I want to remove from my string.

预期输出：This is a string and it contains some texts and tags. which I want to remove from my string.

我在 Python 中使用基于正则表达式的替换函数：

udpated = re.sub(r'/fsd+', '')

但是，这不会获取所需的结果。或者，我建立了一个根除列表，并将其从顶部到较低的循环替换，但这是一个性能杀手。

假设"标签"也可能出现在字符串的最开头，并避免选择误报，也许您可以使用：

s?(?<!S)\[a-zd]+

并替换为任何内容。观看在线演示。

s?- 可选匹配空格字符(如果标签是中间字符串，因此前面有一个空格);
(?<!S)- 断言位置前面没有非空格字符(以允许在输入的开头放置位置);
\- 文字反斜杠。
[a-zd]+- 1+(贪婪)角色根据给定的职业。

首先，/根本不属于正则表达式。

其次，即使您使用的是原始字符串文本，本身对正则表达式引擎具有特殊含义，因此您仍然需要对其进行转义。(如果没有原始字符串文本，则需要'\\fs\d+'。f前面的是按字面意思使用的;d前面的是与数字匹配的字符类的一部分。

最后，sub采用三个参数：模式、替换文本和要对其执行替换的字符串。

>>> re.sub(r'\fsd+', '', r"This is a string fs24 and it contains...")
'This is a string  and it contains...'

这对你有用吗？

re.sub(
r"\w+s*",  # a backslash followed by alphanumerics and optional spacing;
'',           # replace it with an empty string;
input_string  # in your input string
)
>>> re.sub(r"\w+s*", "", r"fs24 hello there")
'hello there'
>>> re.sub(r"\w+s*", "", "hello there")
'hello there'
>>> re.sub(r"\w+s*", "", r"fs24hello there")
'there'
>>> re.sub(r"\w+s*", "", r"fs24hello qc23424 there")
'there'

'\\' 匹配 '\' 和 'w+' 匹配一个单词，直到空格

import re
s = r"""This is a string fs24 and it contains some texts and tags qc23424. which I want to remove from my string."""
re.sub(r'\w+', '', s)

输出：

'This is a string  and it contains some texts and tags . which I want to remove from my string.'

我试过这个，它对我来说效果很好：

def remover(text, state):

removable = text.split("\")[1]
removable = removable.split(" ")[0]
removable = "\" + removable + " "
text = text.replace(removable, "")
state = True if "\" in text else False
return text, state

text = "hello \I'm new here \good luck"
state = True
while state:
text, state = remover(text, state)
print(text)

相关内容

最新更新

热门标签：