基本网络爬虫不提供输出

所以，我在python 3上制作了这个网络爬虫，但它没有影响或输出。我已经尝试了几件事，但没有任何效果，但是如果我不输入{'class'： 'product-thumb '}，那么它就可以工作并给我页面上的所有链接。

这是我的代码：

import requests
from bs4 import BeautifulSoup
def spider(maxpage):
page=1
while page <= maxpage:
url = 'https://www.startech.com.bd/product/search?&search=headphone&category_id=0&page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll('div', {'class': 'product-thumb '}):
href = link.get('href')
print(href)
page += 1

spider(5)

您的代码正在寻找类名为product-thumb的a标记，但实际上，在网页上，类product-thumb的类型为div。如果将代码更改为以下内容，应会看到结果：

def spider():
url = 'https://basketball.realgm.com/'
source_code = requests.get(url)
text = source_code.text
soup = BeautifulSoup(text, 'html.parser')
divs = soup.find_all('div', {'class': 'lead-story'})
for div in divs:
print('text : {}'.format(div.text))

for link in soup.findAll('div', {'class': 'product-thumb '}):
href = link.get('href')

<div>

元素不是链接，虽然您可以将它们分配给名为link的变量，但它们将没有href属性。

该链接是product-thumbdiv的孙子。

我想你可以做link.find('a')但我大约十年没有使用过美丽汤了。

相关内容

最新更新

热门标签：