Regex.NET正在尝试捕获重复的具有前瞻性的组



请注意,我在这里使用的是.NET正则表达式引擎

这是解析字符串:

    <div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 1);" onmouseout="ResidentialListings.degradeListing(this, 1);">
    <div id="Contact1" class="listingDetail">
        <span id="ContactName1" class="c411ListedName"><a href="/res/5068300124/P-DESCHESNES/184421926.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES  on 85 Red Pine Dr">P DESCHESNES</a></span>
        <span class="c411Phone" id="ContactPhone1">(506) 830-2224</span>
        <span class="c411ListingGeo"><span class="adr" id="ContactAddress1">85 Fictive Dr NB</span></span>

        <a class="c411GetDirections c411NoPrint" id="ContactDirections1" href="/map/mapSearch.html?layers=dir&amp;from=85+Red+Pine+Dr+NB&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>

    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/5068300124/P-DESCHESNES/184421926.html" title="P DESCHESNES"><span>&nbsp;</span></a>
    </div>
</div>


<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 2, 0);" onmouseout="ResidentialListings.degradeListing(this, 2, 0);">
    <div id="Contact2" class="listingDetail">
        <span id="ContactName2" class="c411ListedName"><a href="/res/4189883202/P-Deschesnes/179906536.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P Deschesnes  on 6585 Rue des Orchid&eacute;es">P Deschesnes</a></span>
        <span class="c411Phone" id="ContactPhone2">(418) 987-3202</span>
        <span class="c411ListingGeo"><span class="adr" id="ContactAddress2">1000 Rue des Fictive QC G1X 3Z5</span></span>

        <a class="c411GetDirections c411NoPrint" id="ContactDirections2" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+des+Orchid%C3%A9esFictive+QC+G1X+3Z5&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>

    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/4189883202/P-Deschesnes/179906536.html" title="P Deschesnes"><span>&nbsp;</span></a>
    </div>
</div>


<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 3, 0);" onmouseout="ResidentialListings.degradeListing(this, 3, 0);">
    <div id="Contact3" class="listingDetail">
        <span id="ContactName3" class="c411ListedName"><a href="/res/4506702257/P-DESCHESNES/181606171.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES  on 1736 Rue Saint-Alexandre">P DESCHESNES</a></span>
        <span class="c411Phone" id="ContactPhone3">(450) 671-1111</span>
        <span class="c411ListingGeo"><span class="adr" id="ContactAddress3">1736 Rue Fictive Longueuil QC J1J 1T2</span></span>

        <a class="c411GetDirections c411NoPrint" id="ContactDirections3" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+Saint-Fictive+Longueuil+QC+J1J+1T1&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a>

    </div>
    <div class="c411HoverMarker c411NoPrint" style="display:none;">
        <a href="/res/4506702257/P-DESCHESNES/181606171.html" title="P DESCHESNES"><span>&nbsp;</span></a>
    </div>
</div>

你可以在这里看到重复的模式。我想为每个联系人(1,2,3)找到一个匹配项,里面有三组:联系人姓名、电话和地址。

对于这个例子,我应该得到3个匹配,每个匹配包含姓名、电话和地址,但由于某些原因,我只得到最后一个电话和地址。

到目前为止,我的.NET正则表达式是:

(?si)(?(?=.*<div id="Contact[d{1,2}]").*<span id="ContactName[d{1,2}]".*title=.*>(.*)</a>.*id="ContactPhone[d{1,2}]">(.*)</span>.*id="ContactAddress[d{1,2}]">(.*)</span>)

你能告诉我我做错了什么吗?

对于非常简单的HTML片段,正则表达式可能很有用。对于更广泛的东西,比如您的例子,像HTML敏捷包这样的HTML解析器可能是最健壮的解决方案。

不尝试使用正则表达式解析HTML的原因如下:使用正则表达式来解析HTML:为什么不呢。

最新更新