regex捕获内联数学公式，例如python或javascript中的$inline$

我有一个字符串输入(markdown文件的内容(，其中每个数学公式都用$$括起来。

例如：

$$ stand alone $$
$$ stand 
alone $$

And there $$ inline $$. 
$$ inline 2 $$ some text also.

我想将内联公式更改为用$括起来。

我试着使用regex，但到目前为止，我还不能找到正确的解决方案。

这个例子捕获每个$$，但不区分内联和独立：

re.findall(r'$$([^$]+?)$$',  txt)

首先，您可以使用.*+而不是[^$]+。它不捕获包括n的模式。

然后，将"not at starting of line-(?!^)"one_answers"不在第-(?<!$)行末尾"。并用|对它们进行组合。

print(re.findall(r'(?m)(?!^)$$(.+?)$$|$$(.+?)$$(?<!$)', txt))

将打印：

[(' inline ', ''), ('', ' inline 2 ')]

(?m)表示"多行"标志。

如果您想将其替换为REPLACED!!、

print(re.sub(r'(?m)(?!^)$$(.+?)$$|$$(.+?)$$(?<!$)', '$$ REPLACED!! $$', text))

输出：

$$ stand alone $$
$$ stand 
alone $$

And there $$ REPLACED!! $$. 
$$ REPLACED!! $$ some text also.

如果您不喜欢组号不一致，

您可以使用条件模式：

print(re.findall(r'(?m)(.+)?$$(.+?)$$(?(1)|.+)', text))

输出：

[('And there ', ' inline '), ('', ' inline 2 ')]

现在，目标组编号始终为2。

您可以使用向前看和向后看来检查公式之前或之后是否有文本，如下所示：

re.findall(r'(?:(?<=(?: |w))$$([^n$]+?)$$)|(?:$$([^n$]+?)$$(?=(?: |w)))',  txt)

这产生：

[(' inline ', ''), ('', ' inline 2 ')]

你可以在这里了解更多关于look-aheads/behinds的信息，并在这里测试你的模式。

编辑：按照Bosoeng Choi的评论，删除了[n$]中不必要的转义。

如何通过添加^在正则表达式中定义开始。

类似^$$([^$]+?)$$

您可以在一个由锚点包围的捕获组中捕获独立公式，并与另一个组一起使用交替|来捕获内联公式之间的内容。

在替换中，将第1组和第2组放在后面，其中第2组被单个$包围

^($$[sS]*?$$)$|(?<!$)$($[sS]*?$)$(?!$)

Regex演示| Python演示

示例代码

import re
pattern = r"^($$[sS]*?$$)$|(?<!$)$($[sS]*?$)$(?!$)"
test_str = ("$$ stand alone $$nn"
"$$ stand n"
"alone $$nn"
"And there $$ inline $$. ")
regex = re.compile(pattern, re.MULTILINE)
result = re.sub(
regex,
lambda x: x.group(2) if x.group(2) else x.group(1), test_str
)
if result:
print (result)

输出

$$ stand alone $$
$$ stand 
alone $$
And there $ inline $.

相关内容

最新更新

热门标签：