在最后一个列表项上不包含句号的 .NET 字符串



我正在尝试使用 .net 正则表达式来识别 XML 数据中在最后一个标记之前不包含句号的字符串。我对正则表达式没有太多经验。我不确定我需要改变什么以及为什么得到我想要的结果。

数据中每行的末尾都有换行符和回车符。

架构用于 XML。

良好的 XML 数据示例:

<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc.</item>
</randlist>

错误 XML 数据的示例 - 正则表达式应该给出匹配项 - 最后一个</item>之前没有句号:

<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc</item>
</randlist>

我尝试过的Reg exp模式在错误的XML数据中不起作用(未在良好的XML数据上进行测试(:

^<randlist w*=[Ss]*.*[^.]</item>[n]*</randlist>$

使用 http://regexstorm.net/tester 的结果:

0 matches

使用 https://regex101.com/的结果:

0 matches

由于字符串条件的句号和开头,此问题与以下 imo 不同:

不以给定后缀结尾的字符串的正则表达式

解释从3:

/
^<randlist w*=[Ss]*.*[^.]</item>[n]*</randlist>$
/
gm
^ asserts position at start of a line
<randlist  matches the characters <randlist  literally (case sensitive)
w* matches any word character (equal to [a-zA-Z0-9_])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
Match a single character present in the list below [Ss]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
S matches any non-whitespace character (equal to [^rntfv ])
s matches any whitespace character (equal to [rntfv ])
.* matches the character . literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^.]
. matches the character . literally (case sensitive)
< matches the character < literally (case sensitive)
/ matches the character / literally (case sensitive)
item> matches the characters item> literally (case sensitive)
Match a single character present in the list below [n]*
< matches the character < literally (case sensitive)
/ matches the character / literally (case sensitive)
randlist> matches the characters randlist> literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

@Silvanas是绝对正确的。您不应该使用正则表达式来解决此问题,您应该使用某种形式的 XML 解析器来读取数据并查找带有.的行。但是,如果出于某种可怕的原因,您必须使用正则表达式,并且如果您的数据结构与您的示例完全相同,那么正则表达式解决方案将如下所示:

^s+<item>[^<]*?(?<=.)</item>$

如果与该正则表达式有任何匹配项,则您的 xml 格式不正确。但同样,如果空格不正确,如果行上还有其他内容,如果标签未<item>..</item>,等等,则此正则表达式将失败。同样,除非你能绝对保证除了.之外的所有东西都是格式良好的XML,否则你最好不要使用正则表达式来解决这个问题

。编辑:如果开始和结束标记在同一行上,但它不一定标题为"item",并且可能具有属性,请继续尝试以下操作:

^s+<([^<>s]+)[^<>]*>[^<>]*?(?<=.)</1>$
Breakdown:
^           anchor to beginning of line
s+         skip over any whitespace
<           found what looks like an opening tag
([^[]s]+)  match the first word found after the "<", store in capture group 1
[^<>]*>     match whatever remain until the closing ">"
[^<>]*?     match all of the contents up until the next "<"
(?<=.)     ensure the last character was a "."
</1>      match a closing tag where the text after the / is the same as the first word of the opening tag (stored in capture group 1)
$           anchor to end of line

确保设置了多行正则表达式选项,否则 ^ 和 $ 将匹配整个字符串的开头/结尾。与以前一样,与此正则表达式的任何匹配都意味着 XML 在该行上的格式很差。

最新更新