使用提要解析器单独识别 iTunes:关键字和 iTunes:类别?

我正在使用feedparser来解析rss提要，例如 https://www.relay.fm/analogue/feed，但无法确定如何明确识别itunes:category值。

查看提要解析器 iTunes 测试，似乎itunes:keywords和itunes:category值都放入feed['tags']字典中。

从category测试：

<!--
Description: iTunes channel category
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
<channel>
<itunes:category text="Technology"></itunes:category>
</channel>
</rss>

然后keywords：

<!--
Description: iTunes channel keywords
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology' and 
'itunes_keywords' not in feed
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
<channel>
<itunes:keywords>Technology</itunes:keywords>
</channel>
</rss>

对于上面的示例源，条目是：

<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>

和

<itunes:category text="Society &amp; Culture"/>
<itunes:category text="Technology"/>

导致feed[tags]填充如下：

[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
{'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]

有没有办法唯一标识来自itunes:category标签的值？

我找不到一种方法来仅使用feedparser来做到这一点，所以我也使用了漂亮的汤：

import bs4
soup = bs4.BeautifulSoup(raw_data, "lxml")        
def is_itunes_category(tag):
return tag.name == 'itunes:category'
categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]

Feedparser v.6.0.2 实现了特定的itunes:x属性

itunes:category在提要解析器中作为category提供

import feedparser
feedp = feedparser.parse(url)
category = feedp.feed.category

itunes:keywords确实在提要解析器中重命名tags并填充到term

但渠道关键字与项目关键字混合在一起要单独识别项目关键字，请使用scheme作为过滤器

import feedparser
feedp = feedparser.parse(url)
#get all the keywords both item and channel
keywords = [k["term"] for k in feedp["feed"]["tags"]] 
# get the keywords from all the items 
keyword = [t["term"] for t in feedp["feed"]["tags"] if  t["scheme"] == 'http://www.itunes.com/']

这可能会删除其他标签(如果可用(，但如果 itunes：关键字和标签它们共存，它们是重复的。

itunes:duration可作为itunes_duration

import feedparser
feedp = feedparser.parse(url)
duration = feedp["itunes_duration"]

有点偏离主题，但要完成答案：

如果有多个类别可用，则它们将作为元组在类别中公开如文档中所述

>>>import feedparser
>>>feedp = feedparser.parse(url)
>>>categories = feedp.feed.categories 
>>>print(categories)
>>>[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]

但iTunes没有多个类别...

不再需要再次解析beautifulSoup4.

相关内容

最新更新

热门标签：

使用提要解析器单独识别 iTunes:关键字 和 iTunes:类别?

相关内容

最新更新

热门标签：

使用提要解析器单独识别 iTunes:关键字和 iTunes:类别?