请注意,我在这里使用的是.NET正则表达式引擎
这是解析字符串:
<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 1);" onmouseout="ResidentialListings.degradeListing(this, 1);">
<div id="Contact1" class="listingDetail">
<span id="ContactName1" class="c411ListedName"><a href="/res/5068300124/P-DESCHESNES/184421926.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES on 85 Red Pine Dr">P DESCHESNES</a></span>
<span class="c411Phone" id="ContactPhone1">(506) 830-2224</span>
<span class="c411ListingGeo"><span class="adr" id="ContactAddress1">85 Fictive Dr NB</span></span>
<a class="c411GetDirections c411NoPrint" id="ContactDirections1" href="/map/mapSearch.html?layers=dir&from=85+Red+Pine+Dr+NB&what=P+Deschesnes&where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions <span>→</span></a>
</div>
<div class="c411HoverMarker c411NoPrint" style="display:none;">
<a href="/res/5068300124/P-DESCHESNES/184421926.html" title="P DESCHESNES"><span> </span></a>
</div>
</div>
<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 2, 0);" onmouseout="ResidentialListings.degradeListing(this, 2, 0);">
<div id="Contact2" class="listingDetail">
<span id="ContactName2" class="c411ListedName"><a href="/res/4189883202/P-Deschesnes/179906536.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P Deschesnes on 6585 Rue des Orchidées">P Deschesnes</a></span>
<span class="c411Phone" id="ContactPhone2">(418) 987-3202</span>
<span class="c411ListingGeo"><span class="adr" id="ContactAddress2">1000 Rue des Fictive QC G1X 3Z5</span></span>
<a class="c411GetDirections c411NoPrint" id="ContactDirections2" href="/map/mapSearch.html?layers=dir&from=1000+Rue+des+Orchid%C3%A9esFictive+QC+G1X+3Z5&what=P+Deschesnes&where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions <span>→</span></a>
</div>
<div class="c411HoverMarker c411NoPrint" style="display:none;">
<a href="/res/4189883202/P-Deschesnes/179906536.html" title="P Deschesnes"><span> </span></a>
</div>
</div>
<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 3, 0);" onmouseout="ResidentialListings.degradeListing(this, 3, 0);">
<div id="Contact3" class="listingDetail">
<span id="ContactName3" class="c411ListedName"><a href="/res/4506702257/P-DESCHESNES/181606171.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES on 1736 Rue Saint-Alexandre">P DESCHESNES</a></span>
<span class="c411Phone" id="ContactPhone3">(450) 671-1111</span>
<span class="c411ListingGeo"><span class="adr" id="ContactAddress3">1736 Rue Fictive Longueuil QC J1J 1T2</span></span>
<a class="c411GetDirections c411NoPrint" id="ContactDirections3" href="/map/mapSearch.html?layers=dir&from=1000+Rue+Saint-Fictive+Longueuil+QC+J1J+1T1&what=P+Deschesnes&where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions <span>→</span></a>
</div>
<div class="c411HoverMarker c411NoPrint" style="display:none;">
<a href="/res/4506702257/P-DESCHESNES/181606171.html" title="P DESCHESNES"><span> </span></a>
</div>
</div>
你可以在这里看到重复的模式。我想为每个联系人(1,2,3)找到一个匹配项,里面有三组:联系人姓名、电话和地址。
对于这个例子,我应该得到3个匹配,每个匹配包含姓名、电话和地址,但由于某些原因,我只得到最后一个电话和地址。
到目前为止,我的.NET正则表达式是:
(?si)(?(?=.*<div id="Contact[d{1,2}]").*<span id="ContactName[d{1,2}]".*title=.*>(.*)</a>.*id="ContactPhone[d{1,2}]">(.*)</span>.*id="ContactAddress[d{1,2}]">(.*)</span>)
你能告诉我我做错了什么吗?
对于非常简单的HTML片段,正则表达式可能很有用。对于更广泛的东西,比如您的例子,像HTML敏捷包这样的HTML解析器可能是最健壮的解决方案。
不尝试使用正则表达式解析HTML的原因如下:使用正则表达式来解析HTML:为什么不呢。