我一直在尝试通过BeautifulSoup抓取多个url的值。
我可以成功地为一个URL做到这一点,但我希望做的是从一个单独的文本文件中拉入大约300个URL的列表。
如何在
中引用URL的文本文件而不是单个URLsource = requests.get('https://notmyrealurl.com').text
.
到目前为止我写的是:
import requests
from bs4 import BeautifulSoup
source = requests.get('https://notmyrealurl.com').text
soup = BeautifulSoup(source, features="html.parser")
title = soup.find("meta", attrs={'itemprop': 'acquia_lift:content_keywords'})
print(title["content"] if title is not None else "No meta title given")
您可以使用open
并遍历URL的每一行。
import requests
from bs4 import BeautifulSoup
with open("your_txt.txt") as file:
for line in file:
url = line.rstrip()
source = requests.get(url).text
soup = BeautifulSoup(source, features="html.parser")
title = soup.find("meta", attrs={'itemprop': 'acquia_lift:content_keywords'})
print(title["content"] if title is not None else "No meta title given")
我假设你的txt文件是
url1.com
url2.com
url3.com
您的文本文件需要是一个列表:['url1', 'url2', 'url3', '...']
。下面是读取文本文件的配置代码:
import requests
from bs4 import BeautifulSoup
# Read your text file
yourtextfile = open('yourtextfile.txt').read()
# import it into list
listurl = list(eval(yourtextfile))
for url in listurl:
source = requests.get(url).text
soup = BeautifulSoup(source, features="html.parser")
title = soup.find("meta", attrs={'itemprop': 'acquia_lift:content_keywords'})
print(title["content"] if title is not None else "No meta title given")