查找以 .rss 结尾的 url 与 python beautifulsoup4

我正在尝试找到一种方法来获取类似于iTunes电影预告片的rss提要网址，即

<a href="http://trailers.apple.com/trailers/home/rss/newtrailers.rss">

如何使用美汤匹配以.rss结尾的网址？

您可以使用

re模块并传递正则表达式模式来匹配属性，例如，要在字符串末尾匹配rss，可以使用rss$：

soup = BeautifulSoup("""<a href="http://trailers.apple.com/trailers/home/rss/newtrailers.rss"></a>
<a href="http://trailers.apple.com/trailers/home/rss/newtrailers"></a>""", "html.parser")
import re
soup.find_all("a", {"href": re.compile("rss$")})
# [<a href="http://trailers.apple.com/trailers/home/rss/newtrailers.rss"></a>]

您可以遍历在页面中找到的所有a标签，并检查其href字段是否以.rss结尾

for link in page.findAll(`a`):
    if link['href'].endswith('.rss'):
        **do something**

相关内容

最新更新

热门标签：