web抓取的Open方法中的参数无效

我正试图从祖先那里收集一些数据，我有.net背景，但我想我会在一个项目中尝试一些python。我在第一步就摔倒了，首先我试图打开这个页面，然后打印出行。

from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
raw_html = open('https://www.ancestry.co.uk/search/collections/britisharmyservice/? 
birth=_merthyr+tydfil-wales-united+kingdom_1651442').read()
html = BeautifulSoup(raw_html, 'html.parser')
for p in html.select('tblrow record'):
print(p)

我正在接受一个非法的公开辩论。

根据文档，open用于：

打开[a]文件并返回相应的文件对象。

因此，您不能使用它来下载网页的HTML内容。您可能打算按如下方式使用requests.get：

raw_html = get('https://www.ancestry.co.uk/search/collections/britisharmyservice/? 
birth=_merthyr+tydfil-wales-united+kingdom_1651442').text
# .text gets the raw text of the response 
# (http://docs.python-requests.org/en/master/api/#requests.Response.text)

以下是一些改进代码的建议：

requests.get提供了许多有用的参数，其中之一是params，它允许您以Python字典的形式提供URL参数
如果您需要在访问其文本之前验证请求是否成功，那么只需检查返回的response.status_code == requests.codes.ok。这只包括状态代码200，但如果您需要更多代码，那么response.raise_for_status应该会有所帮助

相关内容

最新更新

热门标签：