有效地将字符串拆分为格式"{ {}, {}, ...}"

我有一个以下格式的string。

string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}"
private void parsestring(string input)
{
string[] tokens = input.Split(','); // I thought this would split on the , seperating the {}
foreach (string item in tokens)     // but that doesn't seem to be what it is doing
{
Console.WriteLine(item); 
}
}

我想要的输出应该是这样的：

112,This is the first day 23/12/2009
132,This is the second day 24/12/2009

但目前，我得到下面的一个：

{112
This is the first day 23/12/2009
{132
This is the second day 24/12/2009

我对 C# 很陌生，任何帮助将不胜感激。

不要执着于Split((是解决方案！这是一件简单的事情，没有它就可以解析。正则表达式的答案可能也可以，但我想就原始效率而言，使"解析器"可以解决问题。

IEnumerable<string> Parse(string input)
{
var results = new List<string>();
int startIndex = 0;            
int currentIndex = 0;
while (currentIndex < input.Length)
{
var currentChar = input[currentIndex];
if (currentChar == '{')
{
startIndex = currentIndex + 1;
}
else if (currentChar == '}')
{
int endIndex = currentIndex - 1;
int length = endIndex - startIndex + 1;
results.Add(input.Substring(startIndex, length));
}
currentIndex++;
}
return results;
}

所以它不缺线。它迭代一次，并且每个"结果"只执行一次分配。通过一些调整，我可能会制作一个具有索引类型的 C#8 版本，从而减少分配？这可能已经足够好了。

你可以花一整天的时间弄清楚如何理解正则表达式，但这很简单：

扫描每个字符。
如果找到{，请注意，下一个字符是结果的开头。
如果找到}，请考虑从最后一个注明的"开始"到此字符之前的索引的所有内容作为"结果"。

这不会捕获不匹配的括号，并且可能会为"}}{"等字符串引发异常。你没有要求处理这些情况，但改进这个逻辑来抓住它并尖叫或恢复并不难。

例如，当找到}时，您可以将startIndex重置为 -1 之类的值。从那里，你可以推断出当你发现{时，startIndex ！= -1 你已经找到了"{{"。你可以推断出当你发现}时 startIndex == -1，你已经找到了 "}}"。如果你以 startIndex <-1 退出循环，那就是一个没有闭}的开盘{。这使得字符串"}whoops"作为一个未覆盖的情况，但它可以通过将startIndex初始化为-2并专门检查来处理。使用正则表达式执行此操作，您会头疼。

我建议这样做的主要原因是你说"有效"。 Icepickle的解决方案很好，但是Split()为每个令牌进行一次分配，然后您为每个TrimX()调用执行分配。这不是"高效"。这就是"n + 2 个分配"。

为此使用Regex：

string[] tokens = Regex.Split(input, @"}s*,s*{")
.Select(i => i.Replace("{", "").Replace("}", ""))
.ToArray();

模式说明：

s*- 匹配零个或多个空格字符

好吧，如果你有一个叫做ParseString的方法，它返回一些东西是一件好事(说它是ParseTokens的可能并不那么糟糕(。因此，如果您这样做，则可以转到以下代码

private static IEnumerable<string> ParseTokens(string input)
{
return input
// removes the leading {
.TrimStart('{')
// removes the trailing }
.TrimEnd('}')
// splits on the different token in the middle
.Split( new string[] { "},{" }, StringSplitOptions.None );
}

它以前对您不起作用的原因是，您对拆分方法工作原理的理解是错误的，它将有效地拆分示例中的所有,。

现在，如果你把这些放在一起，你会得到类似这个dotnetfidd的东西。

using System;
using System.Collections.Generic;
public class Program
{
private static IEnumerable<string> ParseTokens(string input)
{
return input
// removes the leading {
.TrimStart('{')
// removes the trailing }
.TrimEnd('}')
// splits on the different token in the middle
.Split( new string[] { "},{" }, StringSplitOptions.None );
}
public static void Main()
{
var instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}";
foreach (var item in ParseTokens( instance ) ) {
Console.WriteLine( item );
}
}
}

将using System.Text.RegularExpressions;添加到类的顶部

并使用正则表达式拆分方法

string[] tokens = Regex.Split(input, "(?<=}),");

在这里，我们使用积极的前瞻来拆分紧接在 } 之后的,

(注意：(?<=字符串)仅匹配字符串后的所有字符。你可以在这里阅读更多关于它的信息

如果您不想使用正则表达式，以下代码将生成所需的输出。

string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}";
string[] tokens = instance.Replace("},{", "}{").Split('}', '{');
foreach (string item in tokens)
{
if (string.IsNullOrWhiteSpace(item)) continue;
Console.WriteLine(item);
}
Console.ReadLine();

相关内容

最新更新

热门标签：