特定字符的第一个实例的正则表达式，该实例不会紧跟在另一个特定字符之后

我有一个函数translate（），需要多个参数。第一个参数是唯一必需的，是一个字符串，我总是用单引号括起来，如下所示：

translate('hello world');

其他参数是可选的，但可以像这样包含：

translate('hello world', true, 1, 'foobar', 'etc');

字符串本身可以包含转义的单引号，如下所示：

translate('hello's world');

为此，我现在想在所有代码文件中搜索此函数调用的所有实例，并仅提取字符串。为此，我想出了以下 grep，它返回 translate（' 和 '）或 '，之间的所有内容。近乎完美：

grep -RoPh "(?<=translate(').*?(?=')|',)" .

但是，这样做的问题是，如果调用是这样的：

translate('hello 'world', you're great!');

我的 grep 只会返回以下内容：

hello 'world

所以我正在寻找修改它，以便当前查找"）或"的部分，而是查找尚未转义的"的第一个出现，即不会立即跟随\

希望我说得有道理。有什么建议吗？

您可以将

此grep与PCRE正则表达式一起使用：

grep -RoPh "btranslate(s*K'(?:[^'\\]*)(?:\\.[^'\\]*)*'" .

这是一个正则表达式演示

正则表达式分解：

b            # word boundary
translate     # match literal translate
(            # match a (
s*           # match 0 or more whitespace
K            # reset the matched information
'             # match starting single quote
(?:           # start non-capturing group
   [^'\\]*  # match 0 or more chars that are not a backslash or single quote
)             # end non-capturing group
(?:           # start non-capturing group
   \\.      # match a backslash followed by char that is "escaped"
   [^'\\]*  # match 0 or more chars that are not a backslash or single quote
)*            # end non-capturing group
'             # match ending single quote

这是一个没有使用环顾四周K的版本：

grep -oPhR "(?<=btranslate(')(?:[^'\\]*)(?:\\.[^'\\]*)*(?=')" .

正则表达式演示 2

我认为问题在于.*?部分：?使其成为非贪婪模式，这意味着它将采用与模式匹配的最短字符串。实际上，您是在说，"给我一个最短的字符串，后跟引号+接近帕伦或引号+逗号"。在您的示例中，"world "后跟单引号和逗号，因此它与您的模式匹配。在这些情况下，我喜欢使用以下推理：

字符串是引号、零个或多个字符和一个引号：'.*'

字符是任何不是引号的东西（因为引号终止字符串）：'[^']*'

除了您可以通过用反斜杠转义将引号放在字符串中，因此字符要么是"反斜杠后跟引号"，要么是"不是引号"：'(\'|[^'])*'

把它们放在一起，你得到

grep -RoPh "(?<=translate(')(\'|[^'])*(?=')|',)" .

相关内容

最新更新

热门标签：