我有一个简单的抓取函数,从给定的url返回特定的东西。它会发回字典,我想从字典中以某种方式将内容保存到。md文件中。代码如下:
import requests
from bs4 import BeautifulSoup
def get_data(url):
page = requests.get(url).text
soup = BeautifulSoup(page, 'html.parser')
iframe = []
yt_secondary = []
div = soup.find_all('div', attrs={'class': 'tags'})
for entry in div:
tags = entry.text.strip().replace('#', '').split('n')
songs_links = soup.find_all('iframe')[0]
iframe.append(songs_links)
entry = {'tags': tags,
'iframe': songs_links}
return entry
if __name__ == "__main__":
print(get_data('https://nikisaku.tumblr.com/post/643205680992485376/test'))
,它返回这个,如预期的:
{'tags': ['Tagged: testing, test2, test3, .'], 'iframe': <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="281" id="youtube_iframe" src="https://www.youtube.com/embed/bwKfVwiUpvo?feature=oembed&enablejsapi=1&origin=https://safe.txmblr.com&wmode=opaque" width="500"></iframe>}
现在我想把它保存为。md文件,格式为:
---
tags: Tagged: testing, test2, test3, .
---
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="281" id="youtube_iframe" src="https://www.youtube.com/embed/bwKfVwiUpvo?feature=oembed&enablejsapi=1&origin=https://safe.txmblr.com&wmode=opaque" width="500"></iframe>
有可能这样保存吗?我需要将其作为此函数,因为我将使用它来遍历给定页面的X以抓取标签和链接(这有效),并且对于每个结果,我必须创建一个新的。md文件。
提前感谢!
由于函数返回字典,因此可以遍历字典并分别打印键、值:
if __name__ == "__main__":
raw_data = get_data('https://nikisaku.tumblr.com/post/643205680992485376/test')
for key, value in raw_data.items():
if type(value) is list:
print(f"---n{key}: {', '.join(value)}")
else:
print(f"---n{key}: {value}")
结果如下所示:
---
tags: Tagged: testing, test2, test3, .
---
iframe: <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" ...