在最后一个列表项上不包含句号的 .NET 字符串

我正在尝试使用 .net 正则表达式来识别 XML 数据中在最后一个标记之前不包含句号的字符串。我对正则表达式没有太多经验。我不确定我需要改变什么以及为什么得到我想要的结果。

数据中每行的末尾都有换行符和回车符。

架构用于 XML。

良好的 XML 数据示例：

<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc.</item>
</randlist>

错误 XML 数据的示例 - 正则表达式应该给出匹配项 - 最后一个</item>之前没有句号：

<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc</item>
</randlist>

我尝试过的Reg exp模式在错误的XML数据中不起作用(未在良好的XML数据上进行测试(：

^<randlist w*=[Ss]*.*[^.]</item>[n]*</randlist>$

使用 http://regexstorm.net/tester 的结果：

0 matches

使用 https://regex101.com/的结果：

0 matches

由于字符串条件的句号和开头，此问题与以下 imo 不同：

不以给定后缀结尾的字符串的正则表达式

解释从3：

/
^<randlist w*=[Ss]*.*[^.]</item>[n]*</randlist>$
/
gm
^ asserts position at start of a line
<randlist  matches the characters <randlist  literally (case sensitive)
w* matches any word character (equal to [a-zA-Z0-9_])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
Match a single character present in the list below [Ss]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
S matches any non-whitespace character (equal to [^rntfv ])
s matches any whitespace character (equal to [rntfv ])
.* matches the character . literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^.]
. matches the character . literally (case sensitive)
< matches the character < literally (case sensitive)
/ matches the character / literally (case sensitive)
item> matches the characters item> literally (case sensitive)
Match a single character present in the list below [n]*
< matches the character < literally (case sensitive)
/ matches the character / literally (case sensitive)
randlist> matches the characters randlist> literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

@Silvanas是绝对正确的。您不应该使用正则表达式来解决此问题，您应该使用某种形式的 XML 解析器来读取数据并查找带有.的行。但是，如果出于某种可怕的原因，您必须使用正则表达式，并且如果您的数据结构与您的示例完全相同，那么正则表达式解决方案将如下所示：

^s+<item>[^<]*?(?<=.)</item>$

如果与该正则表达式有任何匹配项，则您的 xml 格式不正确。但同样，如果空格不正确，如果行上还有其他内容，如果标签未<item>..</item>，等等，则此正则表达式将失败。同样，除非你能绝对保证除了.之外的所有东西都是格式良好的XML，否则你最好不要使用正则表达式来解决这个问题

。编辑：如果开始和结束标记在同一行上，但它不一定标题为"item"，并且可能具有属性，请继续尝试以下操作：

^s+<([^<>s]+)[^<>]*>[^<>]*?(?<=.)</1>$
Breakdown:
^           anchor to beginning of line
s+         skip over any whitespace
<           found what looks like an opening tag
([^[]s]+)  match the first word found after the "<", store in capture group 1
[^<>]*>     match whatever remain until the closing ">"
[^<>]*?     match all of the contents up until the next "<"
(?<=.)     ensure the last character was a "."
</1>      match a closing tag where the text after the / is the same as the first word of the opening tag (stored in capture group 1)
$           anchor to end of line

确保设置了多行正则表达式选项，否则 ^ 和 $ 将匹配整个字符串的开头/结尾。与以前一样，与此正则表达式的任何匹配都意味着 XML 在该行上的格式很差。

相关内容

最新更新

热门标签：