如何仅从 RSS 源项获取说明的有用部分



我使用 python feedparser 从可混搭的 feed 中获取此项目["description"]:

<img alt="9f4397d9c05e474fa54291507ad9c03a" src="http://rack.2.mshcdn.com/media/ZgkyMDE2LzA0LzI2LzM0LzlmNDM5N2Q5YzA1LjMzODI0LmpwZwpwCXRodW1iCTU3NXgzMjMjCmUJanBn/393b8db2/53c/9f4397d9c05e474fa54291507ad9c03a.jpg" />
<div style="float: right; width: 50px;"><a href="http://twitter.com/share?via=Mashable&amp;text=Nail+polish+stockings+are+exactly+what+you+need+for+a+lazy+summer+pedicure&amp;src=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F" style="margin: 10px;"><img alt="Feed-tw" border="0" src="http://rack.1.mshcdn.com/assets/feed-tw-f7c0a094d16b7ee7c91a1e50839a8e00.jpg" /></a><a href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F&amp;src=sp" style="margin: 10px;"><img alt="Feed-fb" border="0" src="http://rack.1.mshcdn.com/assets/feed-fb-c0a21e8841794479b8086c32c6f24ba1.jpg" /></a></div>
<div>
    <p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>
    <p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p>
    <div>
        <p>SEE ALSO: <a href="http://mashable.com/2016/02/23/weiner-dog-ear-plugs/">Weiner dog ear plugs will help you sleep deeper than a newborn pup</a></p>
    </div>
    <figure>
        <p><img class="" src="http://rack.1.mshcdn.com/media/ZgkyMDE2LzA0LzI2L2M1L3RvZW5haWxhcnRwLjI4NjBiLmpwZwpwCXRodW1iCTU3NXg0MDk2Pg/4f07495a/b32/toe-nail-art-polish-stockings-japan-10.jpg" /></p>
        <div>
            <p>Image:  belle maison</p>
        </div>
    </figure>
    <p>If you're worried about looking a little out-of-date with the classic stockings and open-toed heels that your grandma used to wear, don't fret. The stockings are designed to fit individual toes, giving your pedicure a better fit as well. <a href="http://mashable.com/2016/04/26/toe-nail-polish-stockings/">Read more...</a></p>
</div>
More about <a href="http://mashable.com/conversations/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Conversations</a>, <a href="http://mashable.com/pics/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Pics</a>, <a href="http://mashable.com/category/products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Products</a>, <a href="http://mashable.com/lifestyle/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Lifestyle</a>, and <a href="http://mashable.com/category/weird-products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Weird Products</a>

这是非常多的信息。我真正需要读者的部分是:

<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>
<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p>

我如何只获得这部分?我应该只选择python正则表达式吗?我不太确定,因为几乎所有的描述都是不同的,所以为此编写表达式会很困难。是否有另一个 RSS 项目元素仅提供我想要的信息?谢谢!

正如您猜对了,正则表达式将无法完成此任务(此问题的强制性链接)。所以你最好的选择是将你的HTML提供给像Beautifulsoup这样的解析器,并为解析的DOM对象编写你的逻辑。

from bs4 import BeautifulSoup 
soup = BeautifulSoup(my_input_html_string)
my_elements = soup.find_all('p')[0:2]

显然,这段代码假设你总是在你提供给它的任何给定 DOM 中寻找前两个<p>。您必须根据通过查看输入提供的不同描述而发现的一致性来调整逻辑。

如果你想

re路,你可以执行以下操作

pat = re.compile(r"<div>(.*?)</div>")
s = pat.search(html).group(1)
result = [line.strip() for line in s.strip().splitlines()[:2]]
# result
['<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>',
 '<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p>']

但正如你所看到的,它很脏,很可能会破裂。所以一个解决方案是编写一个语法和一个微小的解析器。但是健壮而方便的方法是使用像Beautifulsouplxml这样的解析器。

最新更新