Webscraping - Beautifulsoup4 -在find_all循环中访问索引项 &g - Webscraping - Beautifulsoup4 - Accessing indexed item in a find

如何在for循环中选择列表中的项目?

当我不带括号打印它时，我得到了完整的列表，每个索引似乎都是我需要的正确项目

for h3 in soup.find_all('h3', itemprop="name"):
bookname = h3.a.text
bookname = bookname.split('n')
print(bookname)

然而，当我通过指定索引来打印它时，无论它是在循环内还是在循环外，它都会返回"列表索引超出范围">

for h3 in soup.find_all('h3', itemprop="name"):
bookname = h3.a.text
bookname = bookname.split('n')
print(bookname[2])

我的问题是什么?如何更改代码，以便能够抓取所有h3名称，同时能够在需要时选择特定的索引h3名称?

完整代码如下:

import requests
from bs4 import BeautifulSoup
source = requests.get("https://ca1lib.org/s/ginger") #gets the source of the site and returns it
soup = BeautifulSoup(source.text, 'html5lib')
for h3 in soup.find_all('h3', itemprop="name"):
bookname = h3.a.text
bookname = bookname.split('n')
print(bookname[2])

乍一看，假设h3元素包含更多的图书名称("book1" n"book2"n "book3")，您的问题可能是某些h3元素的元素少于3个，因此bookname[2]部分无法访问更短列表中的元素。另一方面，如果你的h3元素只有一个项目(h3 book1 h3)，你迭代所有的h3标签，所以你基本上是取每一个标签(所以在你的第一次迭代中，你会有"h3 book1 h3"，在你的第二次迭代中，你会有"h3 book2 h3")，在这种情况下，你应该用所有的h3.a创建一个列表。文本元素，然后访问所需的值。希望这对你有帮助!

我忘记追加了。我明白了。

这是我最后的代码:

import requests
from bs4 import BeautifulSoup
source = requests.get("https://ca1lib.org/s/ginger") #gets the source of the site and returns it
soup = BeautifulSoup(source.text, 'html.parser')
liste = []
for h3_tag in soup.find_all('h3', itemprop="name"):
liste.append(h3_tag.a.text.split("n"))
#bookname = h3.a.text #string
#bookname = bookname.split('n') #becomes list
print(liste[5])

Webscraping - Beautifulsoup4 -在find_all循环中访问索引项 &g

相关内容

最新更新

热门标签：