BeautifulSoup find().text 返回'NoneType'对象没有属性'text'如果在 for 循环内

我有一个使用分页的网页，我正在循环浏览页面上的所有页面。我正在尝试使用soup_page_number= soup.find("li", {"class":"page-item active"}).text存储当前页码，它按预期工作，假设soup是一个带有页码的URL，例如。https://www.url.com/?p=1.

然而，当我试图通过循环浏览所有页面(针对前10个页面(来获取网站上所有页面的页码时，例如：

i=1
for i in range(10):
url = "https://www.url.com?p="
url = url + str(i)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content,"html.parser")
soup_page_number = soup.find("li", {"class":"page-item active"}).text        
i+=1

产生以下结果；

AttributeError: 'NoneType' object has no attribute 'text'

这很奇怪，因为将soup_page_number移动到for循环之外将产生正确的结果(但仅针对一页，在本例中为第10页(。是什么原因导致for循环失败？

如果有必要，我尝试访问的HTML如下所示：

<li class = 'page-item active'>
<a class='page-link'>9</a>
</li>

谢谢！

我相信访问"第10页"会奏效；这只是因为您错过了第一次迭代。问题是因为您正在定义"i=1"，然后当您调用"for i in range(10(："时，"i"将重置为"range(10"中的第一个int，即0。

因此，实际上您正在尝试访问0-9页，而不是1-10页。对于您想要的东西，您可以执行以下任一操作：

for i in range(1, 11):
(code goes here)

或

i = 1
while i <= 10:
(code goes here)
i += 1

相关内容

最新更新

热门标签：