令人难以置信的正则表达式,用于将空格-逗号-空格输入字符串转换为数组.必须支持引用



这是我(到目前为止)解决这个问题的最佳尝试。我是正则表达式的新手,这个问题非常严重,但我会尝试一下。正则表达式显然需要一些时间来掌握。

这似乎满足分隔符/逗号要求。对我来说,这似乎是多余的,因为反复/s*。可能还有更好的方法。

/s*[,|s*]s*/

我在 SOF 上发现了这个,并试图将其拆开并将其应用于我的问题(不容易)。这似乎满足了大多数"引用"要求,但我仍在研究如何解决以下要求中的分隔符问题。

/"(?:\\.|[^\\"])*"|S+/

我试图满足的要求:

  • 将由 PHP preg_match_all()(或类似)函数用于将字符串分解为字符串数组。源语言是PHP。
  • 输入字符串中的单词由 (0 或更多空格)(
  • 可选逗号)(0 或更多空格)或仅(1 个或多个空格)分隔。
  • 输入字符串也可以有带引号的子字符串,这些子字符串成为输出数组中的单个元素。
  • 输入字符串
  • 中的带引号的子字符串在放置在输出数组中时必须保留其双引号(因为我们以后必须能够将它们识别为最初在输入字符串中引用)。
  • 将带引号的子字符串中的前导和尾随空格(即双引号字符和字符串本身之间的空格)放入输出数组时必须删除。示例:"helloworld"变为"helloworld"
  • 输入字符串中带引号的短语中的空格在放入其输出数组元素时必须减少到单个空格。示例:"helloworld"变为"helloworld"
  • 输入字符串中带引号的零长度
  • 或仅包含空格的子字符串不会放入输出数组中(输出数组不得包含任何零长度元素)。
  • 输出数组的每个元素都必须修剪(左和右)以留空格。

此示例演示了上述所有要求:

输入字符串:

"

一"二   三"四 , 五"  六七"

"

返回此数组(下面显示的字符串中实际存在双引号):

{一,"二三",四,五,"六七"}

编辑 9/13/2013

几天来,我一直在努力研究正则表达式,最终确定了这个提出的解决方案。它可能不是最好的,但这就是我目前所拥有的。

我将使用此正则表达式使用 PHP 的 preg_match_all() 函数将搜索字符串拆分为数组:

/(?:"([^"]*)"|([^s",]+))/

前导/尾随"/"是 php 函数 preg_match_all() 所必需的。

现在数组已创建,我们从函数调用中检索它,如下所示:

$x = preg_match_all(REGEX);
$Array = $x[0];

我们必须这样做,因为该函数返回一个复合数组,元素 0 包含正则表达式的实际输出。其他返回的元素包含正则表达式捕获的值,我们不需要这些值。

现在,我将迭代生成的数组并处理每个元素以满足要求(上述),这比使用单个正则表达式在单个步骤中满足所有要求要容易得多

我终于为这个问题开发了一个解决方案,其中涉及一些使用正则表达式的PHP语句。下面是最后一个函数。

这个函数是一个类的一部分,这就是为什么它以"public"开头。

public function SearchString_ToArr($SearchString) {
    /*
    Purpose
        Used to parse the specified search string into an array of search terms.
        Search terms are delimited by <0 or more whitespace><optional comma><0 or more whitespace>
    Parameters
        SearchString (string) = The search string we're working with.
    Return (array)
        Returns an array using the following rules to parse the specified search string:
            - Each search term from the search string is converted to a single element in the returned array.
            - Search terms are delimited by whitespace and/or commas, or they may be double quoted.
            - Double-quoted search terms may contain multiple words.
        Unquoted Search Terms:
            - These are delimited by any number of whitespace characters or commas in the search string.
            - These have all leading and trailing whitespace trimmed.
        Quoted Search Terms:
            - These are surrounded by double-quotes in the search string.
            - These retain leading and trailing double-quotes in the returned array.
            - These have all leading and trailing whitespace trimmed.
            - These may contain whitespace.
            - These have all containing whitespace converted into a single space.
            - If these are zero-length or contain only whitespace, they are not included in the returned array.
        Example 1:
            SearchString =  ' "" one " two   three " four "five six" " " '
            Returns {"one", ""two three"", "four", ""five six""}
            Notes   The leading whitespace before the first "" is not returned.
                    The first quoted phrase ("") is empty so it is not returned.
                    The term "one" is returned with leading and trailing whitespace removed.
                    The phrase "two three" is returned with leading and trailing whitspace removed.
                    The phrase "two three" has containing whitespace converted to a single space.
                    The phrase "two three" has leading and trailing double-quotes retained.
                    ...
    Version History
        1.0 2013.09.18 Tested by Russ Tanner on PHP 5.3.10.
    */
    $r = array();
    $Matches = array();
    // Split the search string into an array based on whitespace, commas, and double-quoted phrases.
    preg_match_all('/(?:"([^"]*)"|([^s",]+))/', $SearchString, $Matches);
    // At this point:
    //  1. all quoted strings have their own element and begin/end with the quote character.
    //  2. all non-quoted strings have their own element and are trimmed.
    //  3. empty strings are omitted.
    // Normalize quoted elements...
    // Convert all internal whitespace to a single space.
    $r = preg_replace('/ss+/', ' ', $Matches[0]);
    // Remove all whitespace between the double-quotes and the string.
    $r = preg_replace('/^"s+/', '"', $r);
    $r = preg_replace('/s+"$/', '"', $r);
    return $r;
}

最新更新