我的第一个 python 网络爬虫有问题



我正在尝试在一个名为supremecommunity的网站上为个人项目抓取信息。他们有一个不同季节的项目档案,我正在尝试将这些信息放入csv中。我的代码包括一个我从这个 github 页面获得的循环 https://github.com/CharlieAIO/Supreme-Community-Scraper/blob/master/sup.py

下面是我正在使用的代码,它将正常运行而不会出错,但 csv 除了我设置的标题外仍为空。我在这里做错了什么吗?网站是否拒绝了我的请求?任何帮助或指示表示赞赏。

import requests
from bs4 import BeautifulSoup as bs
urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")
filename = "SupremeData.csv"
headers = "Item,Image,Price,Upvotes,Downvotes"
f = open(filename, "w")
f.write(headers)

for link in urls:
r = requests.get(link)
soup = bs(r.text,"html.parser")
cards = soup.find_all('div',{'class':'card card-2'})
for card in cards:
item = card.find("div",{"class":"card-details"})["data-itemname"]
img = card.find("img",{"class":"prefill-img"})["src"]
image = f'https://supremecommunity.com{img}'
price = card.find("span",{"class":"label-price"}).text
price = price.replace(" ","")
price = price.replace("n","")
upvotes = card.find("p",{"class":"upvotes hidden"}).text
downvotes = card.find("p",{"class":"downvotes hidden"}).text
f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "n")

f.close()    

此代码保存包含产品的一些详细信息的 csv 文件。

import requests
from bs4 import BeautifulSoup as bs
urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")
filename = "SupremeData.csv"
headers = "Item,Image,Price,Upvotes,Downvotes"
f = open(filename, "w")
f.write(headers)

for link in urls:
r = requests.get(link)
soup = bs(r.content,"html.parser")
#print(soup)
cards = soup.find_all('div',{'class':'card-2'})
#print(cards)
#print(len(cards))
for card in cards:
item = card.find("div",{"class":"card__top"})["data-itemname"]
img = card.find("img",{"class":"prefill-img"})["src"]
image = f'https://supremecommunity.com{img}'
try :
price = card.find("span",{"class":"label-price"}).text
price = price.replace(" ","")
price = price.replace("n","")
except :
price = 'Not Available'
try :
upvotes = card.find("p",{"class":"upvotes hidden"}).text
downvotes = card.find("p",{"class":"downvotes hidden"}).text
except:
upvotes = 'Not Found'
downvotes = 'Not Found'
print((item + "," + image + "," + price + "," + upvotes + "," + downvotes + "n"))
f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "n")

f.close()   

这是您可以从该网页获取除价格之外的所有上述字段的方法之一,因为它们尚不可用。要检查它,您可以单击每个图像并切换价格选项卡。我用.select()而不是.find_all()来使其简洁。执行脚本后,您应该会得到一个包含相应必填字段的数据丰富的 csv 文件。

import csv
import requests
from bs4 import BeautifulSoup
base = 'https://www.supremecommunity.com{}'
link = 'https://www.supremecommunity.com/season/fall-winter2011/overview/'
r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"lxml")
with open("supremecommunity.csv","w",newline="") as f:
writer = csv.writer(f)
writer.writerow(['item_name','item_image','upvote','downvote'])
for card in soup.select('[class$="d-card"]'):
item_name = card.select_one('.card__top')['data-itemname']
item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
writer.writerow([item_name,item_image,upvote,downvote])
print(item_name,item_image,upvote,downvote)

最新更新