我想从包含表或图形引用的字符串中删除括号。括号中可以包含多个引用。这种模式在某种程度上是必然的。以下是一些例子:
text = [
"this is a figure ref (figure 7xe2x80x9377)",
"this is multiple refs (figures 6xe2x80x9328 and 6xe2x80x9329)",
"this is a table ref (table 6xe2x80x931)"
]
我使用以下正则表达式:
text = re.sub(r"(([w]sd(\[a-z]+[0-9])+))", " ", text)
您可以删除任何以table
或figure
:开头的括号
re.sub(r's*(s*(?:table|figure)[^()]*)', '', text)
请参阅regex演示详细信息:
s*(s*
-(
两端包含零个或多个空格(?:table|figure)
-table
或figure
字符串[^()]*
-除(
和)
之外的零个或多个字符)
-一个)
字符
请参阅Python演示:
import re
text = [
"this is a figure ref (figure 7xe2x80x9377)",
"this is multiple refs (figures 6xe2x80x9328 and 6xe2x80x9329)",
"this is a table ref (table 6xe2x80x931)"
]
text = [re.sub(r's*(s*(?:table|figure)[^()]*)', '', t) for t in text]
print(text)
# => ['this is a figure ref', 'this is multiple refs', 'this is a table ref']