获取大字符串，而不需要灾难性地回溯正则表达式

我想使用Regex从git diff中获得一个特定的文件(例如package-lock.json)。这种方法的原因是因为我通过Github API(使用Octocat js)获得了一个完整的git diff，因此我不能只是在那个特定的文件上运行git diff。(据我所知)。显然，像package-lock.json这样的文件的差异非常大，所以有很多内容)。我注意到的是，当我尝试使用正则表达式来获取此内容时，由于灾难性的回溯，它失败了。

基本上文件结构是这样的

diff --git a/package-lock.json b/package-lock.json
lots of content
diff --git a/next-file b/next-file

因此，我的想法是获得两个diff --git字符串之间的所有内容。

我想我可以只使用这个/(?<=diff --git )(.+?)(?=diff)/gs这工作很好，如果向前看不是太超前，但经过很长的路通过文件，这停止工作，由于灾难性的回溯。

我知道为什么会发生这种情况，但不知道如何绕过它。也许我应该排序这种其他方式，只是使用Regex更具体的细节?

如有任何帮助，不胜感激。

您正在处理几行数据，正如您所发现的那样，正则表达式不能很好地工作。使用像awk这样的工具可以找到行范围。

给这个文件foo.txt:

Here is stuff I don't care about
diff --git a/package-lock.json b/package-lock.json
lots of content
diff --git a/next-file b/next-file
Don't care about this either.

使用awk指定要打印的行范围:

$ awk '/^diff --git a/package-lock/,/^diff --git a/next-file/' foo.txt
diff --git a/package-lock.json b/package-lock.json
lots of content
diff --git a/next-file b/next-file

相关内容

最新更新

热门标签：