Python bs4 module


import requests
from bs4 import BeautifulSoup
'''
It's a web crawler working in ebay, collecting every single item data
'''
def ebay_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'http://www.ebay.co.uk/sch/Apple-Laptops/111422/i.html?_pgn=' 
              + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'vip'}):
            href = 'http://www.ebay.co.uk' + link.get('href')
            title = link.string
    get_single_item_data(href)
    page += 1

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for item_name in soup.findAll('h1', {'id': "itemTitle"}):
        print(item_name.string)
ebay_spider(3)

Blockquote错误提示:https://i.stack.imgur.com/zbJ6y.jpg
我试图解决它,但它似乎不工作,所以任何提示/答案如何解决它?

编辑:对不起大家错误的标题和标签,一切都是固定的。

当您尝试在行中创建一个BeatifulSoup对象时,请执行以下操作:

soup = BeautifulSoup(plain_text)

:

soup = BeautifulSoup(plain_text, 'html.parser')

注意:你的问题是指bs4模块,而不是请求。

这与请求模块完全无关。正如Jean-Francois所说,做它告诉你的,然后继续前进。

soup = BeautifulSoup(plain_text,"html.parser",markup_type=markup_ty‌​pe)

相关内容

  • 没有找到相关文章

最新更新