如何在LaTex中使用正则表达式查找和组合Whole Chemical Compound字符串



对于以下字符串:

Which one of following pairs of gases is the major cause of greenhouse effect?
A. ( C O_{2} ) and ( O_{3} )
в. ( C O_{2} ) and ( C O )
c. ( C F C ) and ( S O_{2} )
D. ( C O_{2} ) and ( N_{2} O ) 

我想要一些类似的东西:

Which one of following pairs of gases is the major cause of greenhouse effect?
A. ( CO2 ) and ( O3 )
в. ( CO2 ) and ( CO )
c. ( CFC ) and ( SO2 )
D. ( CO2 ) and ( N2O ) 

我用re.sub('[A-Z]_{[0-9]}', '<CHEM>', text)作为实验,这样我就可以把两者结合起来。我怎么能把整个方程式组合在一起呢?每个元素由一个空格分隔,并且每个元素可以是大写字母和/或由1个或多个字母组成。它可能类似于:

( Na Cl_{2} ) and ( Fe k_{3} cl )->( NaCl2 ) and ( Fek3cl )

您可以将捕获组与re.sub:一起使用

re.sub(r'([A-Z][a-z]?)(_{([0-9]+)})? *', r'13', text)

在线试用!

如果你想保留最后一个元素后面的空白,你可以使用

re.sub(r'([A-Z][a-z]?)(_{([0-9]+)})?( *(?=[A-Z]))?', r'13', text)

在线试用!

说明:

([A-Z][a-z]?)(_{([0-9]+)})? *
([A-Z][a-z]?)                             # Matches chemical names. Captures the name of the chemical in group 1.
(_{([0-9]+)})?               # Matches a potential subscript. Captures the number in group 3.
*             # Matches trailing whitespace. This causes it to be removed
( *(?=[A-Z]))? # Alternatively, match the whitespace, only if it's followed by a capital letter. This means that it will be removed only if it's followed by a chemical element.

您可以编写

rgx = r'(?<!\()[ _{}](?=[ A-Zd _{}]* \))'
re.sub(rgx, '', str)

演示

正则表达式可以分解如下。

(?<!            # begin a negative lookbehind
\(          # match '('          
)               # end negative lookbehind
[ _{}]          # match a character in the char class
(?=             # begin a positive lookahead
[ A-Zd _{}]* # match zero or more characters in the char class
[ ]\)       # match ' )'
)               # end positive lookahead

我将空格字符放在字符类([ ](中,只是为了使其可见。

您可以使用

import re
text = r"( Na Cl_{2} ) and ( Fe k_{3} cl  )"
print( re.sub(r'\(s*([^()]*?)s*\)', lambda x: f'\( {"".join(c if c.isalnum() else "" for c in x.group(1))} \)', text) )

请参阅Python演示,请参阅regex演示详细信息

  • \-一个字符
  • (-一个(字符
  • s*-零个或多个空白
  • ([^()]*?)-第1组:除)(之外的任何零个或多个字符
  • s*\)-零个或多个空白,然后是一个)字符串

lambda x: f'\( {"".join(c if c.isalnum() else "" for c in x.group(1))} \)'替换将匹配项替换为(,组1中删除了所有非字母数字字符并替换了)

最新更新