Regex删除包含图形或表引用的括号



我想从包含表或图形引用的字符串中删除括号。括号中可以包含多个引用。这种模式在某种程度上是必然的。以下是一些例子:

text = [
"this is a figure ref (figure 7xe2x80x9377)",
"this is multiple refs (figures 6xe2x80x9328 and 6xe2x80x9329)",
"this is a table ref (table 6xe2x80x931)"
]

我使用以下正则表达式:

text = re.sub(r"(([w]sd(\[a-z]+[0-9])+))", " ", text)

您可以删除任何以tablefigure:开头的括号

re.sub(r's*(s*(?:table|figure)[^()]*)', '', text)

请参阅regex演示详细信息

  • s*(s*-(两端包含零个或多个空格
  • (?:table|figure)-tablefigure字符串
  • [^()]*-除()之外的零个或多个字符
  • )-一个)字符

请参阅Python演示:

import re
text = [
"this is a figure ref (figure 7xe2x80x9377)",
"this is multiple refs (figures 6xe2x80x9328 and 6xe2x80x9329)",
"this is a table ref (table 6xe2x80x931)"
]
text = [re.sub(r's*(s*(?:table|figure)[^()]*)', '', t) for t in text]
print(text)
# => ['this is a figure ref', 'this is multiple refs', 'this is a table ref']

最新更新