我正在尝试用像这样的自定义标签来解析字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
我已经弄清楚要使用什么行元
/[color value=(.*?)](.*?)[/color]/gs
/[wave](.*?)[/wave]/gs
/[shake](.*?)[/shake]/gs
但是,我需要在结果字符串中获得这些组的正确范围(StartIndex,endIndex),以便我可以正确应用它们。这就是我完全迷失的地方,因为每次我替换标签时,索引都有机会混乱。它特别难于嵌套标签。
所以输入是字符串
[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave].
I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]
,在输出中,我想得到
之类的东西Apply color 0x000000 from 0 to 75
Apply wave from 14 to 20
Apply color 0xFF0000 from 14 to 20
Apply shake from 46 to 51
请注意,该索引与结果字符串匹配。
我该如何解析?
不幸的是,我不熟悉ActionScript,但是此C#代码使用正则表达式显示一个解决方案。我不用匹配特定标签,而是使用了可以匹配任何标签的正则表达式。而且,我没有尝试制作与整个启动和结束标签匹配的正则表达式,包括介于两者之间的文本(我认为使用嵌套标签是不可能的),而是使正则表达式仅与start a Start 或匹配结束标签,然后进行了一些额外的处理来匹配开始和结束标签,并将其从字符串中删除,以保留基本信息。
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
string data = "[color value=0x000000]This house is [wave][color value=0xFF0000]haunted[/color][/wave]. " +
"I've heard about ghosts [shake]screaming[/shake] here after midnight.[/color]";
ParsedData result = ParseData(data);
foreach (TagInfo t in result.tags)
{
if (string.IsNullOrEmpty(t.attributeName))
{
Console.WriteLine("Apply {0} from {1} to {2}", t.name, t.start, t.start + t.length - 1);
}
else
{
Console.WriteLine("Apply {0} {1}={2} from {3} to {4}", t.name, t.attributeName, t.attributeValue, t.start, t.start + t.length - 1);
}
Console.WriteLine(result.data);
Console.WriteLine("{0}{1}n", new string(' ', t.start), new string('-', t.length));
}
}
static ParsedData ParseData(string data)
{
List<TagInfo> tagList = new List<TagInfo>();
Regex reTag = new Regex(@"[(w+)(s+(w+)s*=s*([^]]+))?]|[(/w+)]");
Match m = reTag.Match(data);
// Phase 1 - Collect all the start and end tags, noting their position in the original data string
while (m.Success)
{
if (m.Groups[1].Success) // Matched a start tag
{
tagList.Add(new TagInfo()
{
name = m.Groups[1].Value,
attributeName = m.Groups[3].Value,
attributeValue = m.Groups[4].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
else if (m.Groups[5].Success)
{
tagList.Add(new TagInfo()
{
name = m.Groups[5].Value,
tagLength = m.Groups[0].Length,
start = m.Groups[0].Index
});
}
m = m.NextMatch();
}
// Phase 2 - match end tags to start tags
List<TagInfo> unmatched = new List<TagInfo>();
foreach (TagInfo t in tagList)
{
if (t.name.StartsWith("/"))
{
for (int i = unmatched.Count - 1; i >= 0; i--)
{
if (unmatched[i].name == t.name.Substring(1))
{
t.otherEnd = unmatched[i];
unmatched[i].otherEnd = t;
unmatched.Remove(unmatched[i]);
break;
}
}
}
else
{
unmatched.Add(t);
}
}
int subtractLength = 0;
// Phase 3 - Remove tags from the string, updating start positions and calculating length in the process
foreach (TagInfo t in tagList.ToArray())
{
t.start -= subtractLength;
// If this is an end tag, calculate the length for the corresponding start tag,
// and remove the end tag from the tag list.
if (t.otherEnd.start < t.start)
{
t.otherEnd.length = t.start - t.otherEnd.start;
tagList.Remove(t);
}
// Keep track of how many characters in tags have been removed from the string so far
subtractLength += t.tagLength;
}
return new ParsedData()
{
data = reTag.Replace(data, string.Empty),
tags = tagList.ToArray()
};
}
class TagInfo
{
public int start;
public int length;
public int tagLength;
public string name;
public string attributeName;
public string attributeValue;
public TagInfo otherEnd;
}
class ParsedData
{
public string data;
public TagInfo[] tags;
}
}
输出为:
Apply color value=0x000000 from 0 to 76
This house is haunted. I've heard about ghosts screaming here after midnight.
-----------------------------------------------------------------------------
Apply wave from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply color value=0xFF0000 from 14 to 20
This house is haunted. I've heard about ghosts screaming here after midnight.
-------
Apply shake from 47 to 55
This house is haunted. I've heard about ghosts screaming here after midnight.
---------
让我向您展示一种解析方法,您不仅可以应用于上面的情况,而且还可以对所有情况进行切割。此方法不仅限于术语 - 颜色,波浪,摇动。
private List<Tuple<string, string>> getVals(string input)
{
List<Tuple<string, string>> finals = new List<Tuple<string,string>>();
// first parser
var mts = Regex.Matches(input, @"[[^u005D]+]");
foreach (var mt in mts)
{
// has no value=
if (!Regex.IsMatch(mt.ToString(), @"(?i)value[nrts]*="))
{
// not closing tag
if (!Regex.IsMatch(mt.ToString(), @"^[[nrts]*/"))
{
try
{
finals.Add(new Tuple<string, string>(Regex.Replace(mt.ToString(), @"^[|]$", "").Trim(), ""));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
// has value=
else
{
try
{
var spls = Regex.Split(mt.ToString(), @"(?i)value[nrts]*=");
finals.Add(new Tuple<string, string>(Regex.Replace(spls[0].ToString(), @"^[", "").Trim(), Regex.Replace(spls[1].ToString(), @"^]$", "").Trim()));
}
catch (Exception es)
{
Console.WriteLine(es.ToString());
}
}
}
return finals;
}
我也有一个带有单个正则表达式的json的经验。如果您想知道它是什么,请访问我的博客www.mysplitter.com。