如何在LaTex中使用正则表达式查找和组合Whole Chemical Compound字符串

对于以下字符串：

Which one of following pairs of gases is the major cause of greenhouse effect?
A. ( C O_{2} ) and ( O_{3} )
в. ( C O_{2} ) and ( C O )
c. ( C F C ) and ( S O_{2} )
D. ( C O_{2} ) and ( N_{2} O )

我想要一些类似的东西：

Which one of following pairs of gases is the major cause of greenhouse effect?
A. ( CO2 ) and ( O3 )
в. ( CO2 ) and ( CO )
c. ( CFC ) and ( SO2 )
D. ( CO2 ) and ( N2O )

我用re.sub('[A-Z]_{[0-9]}', '<CHEM>', text)作为实验，这样我就可以把两者结合起来。我怎么能把整个方程式组合在一起呢？每个元素由一个空格分隔，并且每个元素可以是大写字母和/或由1个或多个字母组成。它可能类似于：

( Na Cl_{2} ) and ( Fe k_{3} cl )->( NaCl2 ) and ( Fek3cl )

您可以将捕获组与re.sub:一起使用

re.sub(r'([A-Z][a-z]?)(_{([0-9]+)})? *', r'13', text)

在线试用！

如果你想保留最后一个元素后面的空白，你可以使用

re.sub(r'([A-Z][a-z]?)(_{([0-9]+)})?( *(?=[A-Z]))?', r'13', text)

在线试用！

说明：

([A-Z][a-z]?)(_{([0-9]+)})? *
([A-Z][a-z]?)                             # Matches chemical names. Captures the name of the chemical in group 1.
(_{([0-9]+)})?               # Matches a potential subscript. Captures the number in group 3.
*             # Matches trailing whitespace. This causes it to be removed
( *(?=[A-Z]))? # Alternatively, match the whitespace, only if it's followed by a capital letter. This means that it will be removed only if it's followed by a chemical element.

您可以编写

rgx = r'(?<!\()[ _{}](?=[ A-Zd _{}]* \))'

re.sub(rgx, '', str)

演示

正则表达式可以分解如下。

(?<!            # begin a negative lookbehind
\(          # match '('          
)               # end negative lookbehind
[ _{}]          # match a character in the char class
(?=             # begin a positive lookahead
[ A-Zd _{}]* # match zero or more characters in the char class
[ ]\)       # match ' )'
)               # end positive lookahead

我将空格字符放在字符类([ ](中，只是为了使其可见。

您可以使用

import re
text = r"( Na Cl_{2} ) and ( Fe k_{3} cl  )"
print( re.sub(r'\(s*([^()]*?)s*\)', lambda x: f'\( {"".join(c if c.isalnum() else "" for c in x.group(1))} \)', text) )

请参阅Python演示，请参阅regex演示详细信息：

\-一个字符
(-一个(字符
s*-零个或多个空白
([^()]*?)-第1组：除)、(之外的任何零个或多个字符
s*\)-零个或多个空白，然后是一个)字符串

lambda x: f'\( {"".join(c if c.isalnum() else "" for c in x.group(1))} \)'替换将匹配项替换为(，组1中删除了所有非字母数字字符并替换了)。

说明：

相关内容

最新更新

热门标签：