正则表达式匹配未被另一个不同的特定字符串包围的特定字符串



我需要一个正则表达式来匹配未被另一个不同的特定字符串包围的字符串。例如,在以下情况下,它会将内容分为两组:1) 第二个 {Switch} 之前的内容和 2) 第二个 {Switch} 之后的内容。它与第一个 {Switch} 不匹配,因为它被 {my_string} 包围。字符串将始终如下所示(即{my_string}此处的任何内容{/my_string})

Some more  
  {my_string}
  Random content
  {Switch} //This {Switch} may or may not be here, but should be ignored if it is present
  More random content
  {/my_string}
Content here too
{Switch}
More content

到目前为止,我已经得到了我知道根本不接近的下面的东西:

(.*?){Switch}(.*?)

我只是不确定如何将 [^](not 运算符)与特定字符串与不同字符一起使用。

看起来你真的在尝试使用正则表达式来解析语法 - 正则表达式非常不擅长做的事情。最好编写一个解析器,将字符串分解为生成它的令牌,然后处理该树。

也许像 http://drupal.org/project/grammar_parser 这样的东西可能会有所帮助。

试试这个简单的函数:

函数 find_content()

function find_content($doc) {
  $temp = $doc;
  preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
  $i = 0;
  while (isset($x[0][$i])) {
    $temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
    $i++;
    }
  $res = explode('{Switch}', $temp);
  foreach ($res as &$part) 
    foreach($x[0] as $id=>$content)
      $part = str_replace("{REPL:$id}", $content, $part);
  return $res;
  }

以这种方式使用它

$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);

输出(您的示例)

Array
(
    [0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
    [1] => 
More content
)

您可以尝试正面前瞻断言和后瞻断言 (http://www.regular-expressions.info/lookaround.html)

它可能看起来像这样:

$content = 'string of text before some random content switch text some more random content string of text after';
$before  = preg_quote('String of text before');
$switch  = preg_quote('switch text');
$after   = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
    // $matches[1] == ' some random content '
    // $matches[2] == ' some more random content '
}
$regex = (?:(?!{my_string})(.*?))({Switch})(?:(.*?)(?!{my_string}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "{" and "}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);

这可能会起作用。无法测试它知道,但我稍后会更新!我不知道这是否是您要查找的,但是要否定多个字符,正则表达式语法是:

(?!yourString) 

它被称为"消极的前瞻断言"。

/编辑:

这应该有效并返回 true:

$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);

看看 PHP PEG。它是一个用PHP编写的小解析器。您可以编写自己的语法并对其进行解析。在你的情况下,这将非常简单。

语法

语法和解析方式都在 README.md 中解释

自述文件摘录:

  token*  - Token is optionally repeated
  token+ - Token is repeated at least one
  token? - Token is optionally present

代币可以是:

 - bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
 - literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
 - regexs, surrounded by `/` pairs.
 - expressions - single words (match w+)

示例语法:(文件 EqualRepeat.peg.inc)

class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
 * aabbcc - good
 * aaabbbccc - good
 * aabbc - bad
 * aabbacc - bad
 */
/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}

最新更新