网络爬虫 - Python的新手,我做错了什么,没有看到<A>BS4返回的标签(链接)



我刚开始学习python。基本上,我试图从我的电子商务商店产品的所有链接是存储在下面的html。我没有得到任何结果返回虽然,我似乎不能弄清楚为什么不。

<h3 class="two-lines-name">
    <a title="APPLE IPOD IPOD A1199 2GB" target="_self" href="/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400">
        APPLE IPOD IPOD A1199 2GB
    </a>
</h3>

这是我的python代码

import requests
from bs4 import BeautifulSoup
def my_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'www.buya.com/Store/SAM-S-LOCKER/400?page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'h3 class': "two-lines-name"}):
            href = link.get('href')
            print(href)
        page += 1
    my_spider(5)

结果无数据

Process finished with exit code 0

如果你在实际函数中有函数调用,你实际上并没有运行函数,在你纠正之后,你会得到一个错误,因为这不是一个有效的url传递给请求,最后你的soup.findAll('a', {'h3 class': "two-lines-name"})不会找到任何东西:

def my_spider(max_pages):
    # use range from 1 to max pages 
    for i in range(1, max_pages+1):
        url = 'http://www.buya.com/Store/SAM-S-LOCKER/400?page={}'.format(i) # http:/?...
        source_code = requests.get(url)
        plain_text = source_code.content
        soup = BeautifulSoup(plain_text)
        # you want the h3 tags and to extract the href from the a tags
        for link in soup.findAll("h3", {'class': "two-lines-name"}):
            href = link.a["href"]
            print(href)

my_spider(5) # outside the function
输出:

/Item/Details/12-FT-CHAIN-W-HOOK/cbb1eb65b100459283d15102606208c2/400
/Item/Details/12-INCH-FUSION-SUBWOOFER/534c4d677b2547fb814668b7d061df5d/400
/Item/Details/18-Gold-Chain-14K-Yellow-Gold-2-03g/0aaf2e1e5532461884cb44e786329e80/400
/Item/Details/1820-HANDMADE-STRAIGHT-RAZOR/ed0ba44f98224067b595b726bf01f5ab/400
/Item/Details/2-PAIRS-OF-POCKET-PLIERS-LEATHERMAN/410bcb9e4321426487bee7639b3cb96e/400
/Item/Details/20TH-CENTURY-FOX-Motorcycle-Helmet-RACING-HELMET/e12a75dc7e004e5aa43698c1edf87773/400
/Item/Details/30-CLUBS/a65f1cbff00c4d59ac998dee96eed98b/400
/Item/Details/30-STEEL-CHAINSAW-BLADE/daaca24ede1341c58bb0d0cd32051646/400
/Item/Details/5-GALLON-GLASS-JUG-BREWING-JUG-CHANGE-JAR/dde9b1bfea2a4a23ad93da098ffc674d/400
/Item/Details/5150-SNOWBOARDS-Snowboard-5150-155CM/bcaa07c71c8c4b499a70d34459244f75/400
/Item/Details/6-FT-STEEL-CHAIN/7c24fb1a16ac46e7b9e91f99883652f6/400
/Item/Details/6-5-CUSTOM-HUNTING-KNIFE/ffda1685b2324abe96e3fb7cb6f7f265/400
/Item/Details/95150/39cb080edd474eb6b770b26b40e3dc6b/400
/Item/Details/ACER-Monitor-P201W/ff03d9c33ca747e08e4646d2c3d5143e/400
/Item/Details/ACOUSTIC-RESEARCH-Monitor-Speakers-RESEARCH-AW825/856ff1d8beb9480d893f94d9d49a8642/400
/Item/Details/ACTIVISION-Microsoft-XBOX-360-CALL-OF-DUTY-BLACK-OPS-2-XBOX-360/aef62055b4f14e379f2eea154d162551/400
/Item/Details/ACTIVISION-Video-Game-Accessory-DJ-HERO-95837809/41e3c7f0114e497caf23d8a50fe1f547/400
/Item/Details/ACTIVISION-Video-Game-Accessory-WII-FIT/7daee2a759a54dd7a4e2b6acd37b9c3e/400
/Item/Details/AIMTECH-1911-SCOPE-MOUNT/ac69ae1c40fe4d7db8c53a8ebf842d7d/400
/Item/Details/AIRCO-TIG-WELDING-TUNGSTEN-Arc-Welder-ELECTRODE/70b9b35db0c547c29eb90e02ef60d91a/400
/Item/Details/AIWA-Portable-CD-Player-XP-SP911/75761bfff9a44093be51e4d70410bd85/400
/Item/Details/ALESSI-Gent-s-Wristwatch-KARIM-RASHID/251c3f95173f49078722b301e1d920fe/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/87634c0c08d2458ba5b84fa39c9bc3fc/400
/Item/Details/ALL-AMERICAN-RIDER-Motorcycle-Part-SADDLE-BAGS/803f6dfdc9f44326a5a52b63681779ad/400
/Item/Details/ALLY-SKATEBAORD-USED/716cec1588d9408e859718f5961e1ec6/400
/Item/Details/ALPINE-ARCHERY-Bow-FRONTIER/e73dda8034cf4cdb8ebeeebc9683b55d/400
/Item/Details/AMAZON-Tablet-KINDLE-D01100/ea9ac5b291ef487ea6f75ca328e05750/400
/Item/Details/AMAZON-Tablet-KINDLE-FIRE-D01400/ebe0e7001ac744ffa030fd153942a548/400
/Item/Details/APPLE-Computer-Accessories-A1023/6a38f60d2e034dc597043cc42282246e/400
/Item/Details/APPLE-Cell-Phone-Smart-Phone-IPHONE-5C-A1532-AT-T/cc65c513e848475c8000b6e10b6855e5/400
/Item/Details/APPLE-IPOD-IPOD-A1199-2GB/d1003297dbe7443c8953750f0c96c62a/400
...................................................

你的表格是错误的…你在my_spider函数中调用my_spider…删除最后一行的表格,它应该可以正常工作。

最新更新