正则表达式的优化和正则匹配群上的迭代



谢谢你们的回答!这就是我所选择的,非常完美

var m4ny = new Regex(@"(?<Name>[^(]*collections?)(?<Year>([^)]*))", RegexOptions.IgnoreCase);
var m4y = new Regex(@"b(19|20)[0-9]{2}b");
if (m4ny.IsMatch(fn) && m4y.IsMatch(fn)) return "[M4] " + UcWords(removeDomains(m4ny.Match(fn).Groups[@"Name"].Value)) + " (" + string.Join(@", ", m4y.Matches(fn).Cast<Match>().Select(m => m.Value)) + ")";

下面的C#代码运行良好,但非常糟糕,我知道会有更好的方法来做这件事,如果是PHP,我可以很容易地浓缩,但这是我的C#知识的范围,欢迎任何指针。

Regex r4 = new Regex("(?<Name>.*)\((?<Year>[-, ]*(20|19)[0-9]{2}[-, ]*)*\)", RegexOptions.IgnoreCase);
Match m4 = r4.Match(fn);
if (fn.ToLower().Contains("collection") && m4.Success)
{
string years = "";
Group g = m4.Groups["Year"];

CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
years = years + c.Value.Replace(",", "").Replace("-", "").Replace(" ", "") + ", ";
}
return "[M4] " + UcWords(removeDomains(m4.Groups["Name"].Value).Replace('.', ' ').Replace(",", "").Replace(":", "").Replace("  ", " ").ToLower()).Trim() + " (" + years.Substring(0,years.Length - 2) + ")";
}

这段特定的代码基本匹配:

Anything written    hERE collEction (1928, 1957- 1977,1989    2001)

并返回:

Anything Written Here Collection (1928, 1957, 1977, 1989, 2001)

我认为,有了更好的正则表达式,就不需要把字符串拆开再粘在一起,要么这样,要么必须有一种更整洁的方式来交互匹配组(最好是在一条语句中(

谢谢!院长

院长

有时我发现把事情分解成两个表达更容易。以下内容将为您提供您想要的内容,并可能提供您所希望的一些简单性:

using System.Text.RegularExpressions;
using System.Globalization;
using System.Linq;
private string FindAndCleanDateStrings()
{
var fn = @"Anything written    hERE collEction (1928, 1957- 1977,1989    2001)";
var rxNameYear = new Regex(@"(?<Name>[^(]*collections?)(?<Year>([^)]*))",RegexOptions.IgnoreCase);
var rxYears = new Regex(@"b(19|20)[0-9]{2}b");
//Validate
if (!rxNameYear.IsMatch(fn) || !rxYears.IsMatch(fn)) return null;
//Clean up <Name>
var textInfo = new CultureInfo("en-US", false).TextInfo;
var name = textInfo.ToTitleCase(rxNameYear.Match(fn).Groups[@"Name"].Value.Trim().ToLower());
name = Regex.Replace(name, @"s+", " ");
//Return and print completed string.
var newStr = $@"{name} ({string.Join(@", ",rxYears.Matches(fn).Cast<Match>().Select(m => m.Value))})";
Console.WriteLine(newStr);
return newStr;
}

以上返回:";Anything Written Here Collection(1928、1957、1977、1989、2001(";

如果您只是对集合感兴趣,那么只使用SplitJoin可能会更有效(假设您不需要更多验证(。

给定

private static readonly Regex _regex = new Regex(@"(?<=collEctions().*(?=))", RegexOptions.Compiled | RegexOptions.IgnoreCase);
private static string FormatString(string input)
{
var match = _regex.Match(input);
if (!match.Success) return input;
var values = match.Value.Split(new[] {',', '-', ' '}, StringSplitOptions.RemoveEmptyEntries);
return _regex.Replace(input, string.Join(", ", values));
}

用法

var list = new List<string>()
{
"Anything written    hERE collEction (1928, 1957- 1977,1989    2001)",
"Anything written hERE collEction (1928, 1957- 1977,1989    2001)",
"Anything written  hERE ColLeCtIon (1928,,, 1957- ---1977,     1989   2001)",
};

foreach (var item in list.Select(FormatString))
Console.WriteLine(item);

输出

Anything written    hERE collEction (1928, 1957, 1977, 1989, 2001)
Anything written hERE collEction (1928, 1957, 1977, 1989, 2001)
Anything written  hERE ColLeCtIon (1928, 1957, 1977, 1989, 2001)

注意:您也可以在不使用regex的情况下,使用Span<char>、几个if语句和一个切片对字符串进行一次O(n(解析,这将是最高效但可维护性和容错性较差的。尽管任何解决方案都取决于您的具体需求(包括此项(

的一个例子

private static string FormatString2(string input)
{
var span = input.AsSpan();
var match = "collection (".AsSpan();
Span<char> result = stackalloc char[input.Length + 10];
var hasSpace = false;
for (int i = 0, j = 0, index = 0; i < span.Length; i++)
{
if (index < match.Length)
{
if (char.ToLower(span[i]) == match[index]) index++;
else if (index >= 0) index = 0;
result[j++] = span[i];
if (index <= match.Length) continue;
}
if (span[i] >= '0' && span[i] <= '9')
{
result[j++] = span[i];
hasSpace = false;
continue;
}
if (span[i] == ')')
{
result[j++] = span[i];
return new string(result);
}
if (hasSpace) continue;
hasSpace = true;
result[j++] = ',';
result[j++] = ' ';
}
return input;
}

输出

Anything written    hERE collEction (1928, 1957, 1977, 1989, 2001)
Anything written hERE collEction (1928, 1957, 1977, 1989, 2001)
Anything written  hERE ColLeCtIon (1928, 1957, 1977, 1989, 2001)

我并不真正提倡使用这种方法,除非你绝对需要它具有所有限制的性能

最新更新