c# 使用另一个字符串作为分隔符拆分字符串,并将分隔符作为拆分字符串的一部分包含在内



>我需要使用 c# 正则表达式拆分输入字符串。需要知道如何在输出中包含分隔符内容,如下所示。

输入:

string content="heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
string[] delimters = new string[] {"heading1:","heading2:","heading3:"};

预期产出:

outputArray[0] = heading1: contents with respect to heading1
outputArray[1] = heading2: heading2 contents
outputArray[2] = heading3: heading 3 related contents sample strings

我尝试过:

var result = content.Split(delimters,StringSplitOptions.RemoveEmptyEntries);

我得到的输出:

result [0]: " contents with respect to heading1 "
result [1]: " heading2 contents "
result [2]: " heading 3 related contents sample strings"

我在 string.split 或正则表达式中找不到 API 来拆分为预期结果。

您可以使用基于前瞻性的正面解决方案:

var result = Regex.Split(content, $@"(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))})")
                  .Where(x => !string.IsNullOrEmpty(x))

请参阅 C# 演示:

var content="heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
var delimiters = new string[] {"heading1:","heading2:","heading3:"};
Console.WriteLine(
    string.Join("n", 
        Regex.Split(content, $@"(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))})")
             .Where(x => !string.IsNullOrEmpty(x))
    )
);

输出:

heading1: contents with respect to heading1 
heading2: heading2 contents 
heading3: heading 3 related contents sample strings

(?={string.Join("|", delimiters.Select(m => Regex.Escape(m)))})将动态构造一个正则表达式,它看起来像

(?=heading1:|heading2:|heading3:)

请参阅正则表达式演示。该模式基本上将匹配字符串中后跟 herring1:herring2:herring3: 的任何位置,而不会消耗这些子字符串,因此它们将落在输出中。

请注意,delimiters.Select(m => Regex.Escape(m)) 是为了确保分隔符中可能存在的所有特殊正则表达式元字符都被正则表达式引擎转义并视为文字字符。

与其拆分,我建议匹配,然后我们可以订购

private static IEnumerable<string> Solution(string source, string[] delimiters) {
  int from = 0;
  int length = 0;
  // Points at which we can split
  var points = delimiters
      .SelectMany(delimiter => Regex
        .Matches(source, delimiter)
        .OfType<Match>()
        .Select(match => match.Index)
        .Select(index => new {
          index = index,
          delimiter = delimiter,
        }))
      .OrderBy(item => item.index)
      .ThenBy(item => Array.IndexOf(delimiters, item.delimiter)); // tie break
  foreach (var point in points) {
    if (point.index >= from + length) {
      // Condition: we don't want the very first empty part
      if (from != 0 || point.index - from != 0)
        yield return source.Substring(from, point.index - from);
      from = point.index;
      length = point.delimiter.Length;
    }
  }
  yield return source.Substring(from);
}

测试:

string content = 
  "heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
string[] delimiters = new string[] { 
  "heading1:", "heading2:", "heading3:" };
Console.WriteLine(Solution(content, delimiters));

结果:

heading1: contents with respect to heading1 
heading2: heading2 contents 
heading3: heading 3 related contents sample strings

如果我们按数字拆分(第 2 次测试(

Console.WriteLine(Solution(content, new string[] {"[0-9]+"}));

我们会得到

heading
1: contents with respect to heading
1 heading
2: heading
2 contents heading
3: heading 
3 related contents sample strings
string content = "heading1: contents with respect to heading1 heading2: heading2 contents heading3: heading 3 related contents sample strings";
string[] delimters = new string[] { "heading1:", "heading2:", "heading3:" };
var dels = string.Join("|", delimters);
var pattern = "(" + dels + ").*?(?=" + dels + "|\Z)";
var outputArray = Regex.Matches(content, pattern);
foreach (Match match in outputArray)
    Console.WriteLine(match);

模式如下:

(heading1:|heading2:|heading3:).*?(?=heading1:|heading2:|heading3:|Z)

这看起来像是维克多·斯特里比泽夫的答案。
当然,我们应该使用Regex.Escape,正如他所表明的那样。

最新更新