我想删除文章正文中带有大括号的任何段落。
例如,从这段内容中:
<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology & What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five − = 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>
我想删除这部分:
<p>five − = 2 .hide-if-no-js { display: none !important; } </p>
使用以下正则表达式:<p>.*?{.*?}.*?</p>
出于某种奇怪的原因,它删除了整篇文章,而不是包含大括号的这一段......
我用正则表达式代码做错了什么?谢谢!
懒惰/贪婪的量词并不总是按预期工作,而不是与不包括<
的字符串匹配,这对我有用:<p>[^<]*{[^<]*</p>
试试这个:
var str = '<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology & What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five − = 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>';
var result = str.replace(/(<p>[^<]*{.*</p>)/, '');
console.log(result);
正则表达式演示
我建议采用两步法(解析和分析文本节点)。
下面你会发现Python
和PHP
的例子(显然可以用于其他语言):
蟒:
# -*- coding: utf-8> -*-
import re
from bs4 import BeautifulSoup
html = """
<html>
<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology & What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five − = 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
regex = r'{[^}]+}'
for p in soup.find_all('p', string=re.compile(regex)):
p.replaceWith('')
print soup
.PHP:
<?php
$html = "<html>
<p>While orthotic inserts are able to provide great support and pain relief, they aren’t quite as good as a specialty shoe. Remember that an ill-fitting insert can cause permanent damage and talk to a podiatrist about your foot pain for the best recommendation. Click here if you want to learn more about pain in the foot arch unrelated to plantar fasciitis.</p> <h2>Related Posts</h2> <h2>So What Are These Socks Really Good For?</h2> <h2>Are the bottom of your feet causing you problems?</h2> <h2>A PF Relief Guide</h2> <h2>What is Foot Reflexology & What is it Good For?</h2> <h2>Leave a Reply Cancel reply</h2> <p>Your email address will not be published. Required fields are marked *</p> <p>Name</p> <p>Email</p> <p>Website</p> <p>five − = 2 .hide-if-no-js { display: none !important; } </p><h2>Food For Thought January 2016</h2> <h2>Show Us Some Social Love!!</h2> <h2>Recent Posts</h2> <li> The Climate Pledge of Resistance</li> <li> Green Activism in Boulder, Colorado</li> <li> The Truth About Money and Happiness</li> <li> Why Is There So Much Skepticism About Climate Change?</li> <li> Which Device Would Work Best For You?</li>
</html>";
$html = str_replace(' ', ' ', $html); // only because of the
$xml = simplexml_load_string($html);
# look for p tags
$lines = $xml->xpath("//p");
# the actual regex - match anything between curly brackets
$regex = '~{[^}]+}~';
for ($i=0;$i<count($lines);$i++) {
if (preg_match($regex, $lines[$i]->__toString())) {
# unset it if it matches
unset($lines[$i][0]);
}
}
// vanished without a sight...
print_r($xml);
// convert it back to a string
$html = echo $xml->asXML();
?>
我建议采用两步法(解析和分析文本节点)。下面你会发现Python和PHP的例子(显然可以用于其他语言):