如何清理RSS提要摘要字段



我正在尝试将RSS提要放入pandas DataFrame中。其他字段可以很好地工作,但摘要字段仍然是HTML格式。我的代码是:

import feedparser
import pandas as pd
rss_feed = 'https://maavoimat.fi/ajankohtaista/ampuma-ja-melutiedotteet/-/announcements/rss'
feed = feedparser.parse(rss_feed)
posts = []
for post in feed.entries:
posts.append((post.title, post.summary, post.published))
df = pd.DataFrame(posts, columns=['title', 'summary', 'published'])
df

如果没有HTML标记,我如何才能让它很好地显示?

试试这个!

import feedparser
from bs4 import BeautifulSoup
//Parse the RSS feed
//Iterate over the entries
for entry in feed.entries:
summary= entry.summary
soup = BeautifulSoup(summary, 'html.parser')
modified_text = soup.get_text()
entry.summary = modified_text
//Continue your code

最新更新