我正试图从一个网站获取数据,但我在如何处理"索引超出范围";错误或导致.csv文件中出现两行。我所说的错误";索引超出范围";在这个网站上,有些记录可能有空值,我不知道如何将正确的条件放入循环中。我用了一些向导,但没用。
my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))
uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")
headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stockn"
f.write(headers)
for container in containers:
productName = container.findAll("span", {"class":"sku"})
name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"
priceWithVAT = container.findAll("span", {"class":"price-intax"})
price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"
priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"
stock = container.findAll("p", {"class":"stock in-stock"})
stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"
f.write(name + "," + price + "," + priceNot + "," + stock + "n" + "n")
f.close()
然后在.csv文件中,我得到了整个页面的结果,每个产品都被分成两行,比如:
CORRECT,lack price,lack price2,lack on stock
lack name,CORRECT,CORRECT,CORRECT
我的预期输出:
CORRECT, CORRECT, CORRECT, CORRECT
(CORRECT意味着从网站上抓取正确的数据(
当我删除if container.findAll("span", {"class":"sku"}) else "lack name"
和类似的循环,它向我显示了索引超出范围的错误,这是应该的,因为有一些空值。
你能帮我怎么更改代码吗?
需要稍微改变一下这里的逻辑。我要做的不是将每个container
作为产品名称,然后获取产品信息,而是获取包含所有信息的整个容器。您会注意到,每个产品都在<ul class="products ...">
标签下的<li>
标签中。
因此,让我们首先获取具有以'products'
开头的类的<ul>
标记。然后从那里得到所有的<li>
标签。然后,我们将对其中的每一个进行迭代,并提取出所需的数据。
正如您所说,有些标签不存在,所以我们将执行try/except
。它将尝试获取数据,如果失败,它将默认为except
异常。
此外,pandas
是一个非常好和有用的库,可以使用/学习。所以我采用了这个方法,而不是像以前那样写csv文件
代码:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://specjal.com/sklep/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')
rows = []
for product in products:
try:
productName = product.find('span',{'class':'sku'}).text
except:
productName = 'lack name'
try:
priceWithVAT = product.find('span',{'class':'price-intax'}).text
except:
priceWithVAT = 'lack price'
try:
priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
except:
priceWithoutVAT = 'lack price2'
try:
stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
except:
stock = 'lack on stock'
# consider changing the above line to stock = 0
row = {
'productName':productName,
'priceWithVAT':priceWithVAT,
'priceWithoutVAT':priceWithoutVAT,
'stock':stock}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)
输出:
print(df)
productName priceWithVAT priceWithoutVAT stock
0 ZZ 90*105*4 VAY 14.86zł/szt. 12.08 zł bez VAT 10
1 ZZ 85*100*5 VAY 13.76zł/szt. 11.19 zł bez VAT 10
2 ZZ 80*95*4 VAY 12.66zł/szt. 10.29 zł bez VAT 20
3 ZZ 75*90*4 VAY 11.01zł/szt. 8.95 zł bez VAT 20
4 ZZ 70*85*4 VAY 9.91zł/szt. 8.06 zł bez VAT 20
5 ZZ 65*80*5 VAY 9.36zł/szt. 7.61 zł bez VAT 20
6 ZZ 65*80*4 VAY 9.36zł/szt. 7.61 zł bez VAT 20
7 ZZ 60*75*5 VAY 8.25zł/szt. 6.71 zł bez VAT 14
8 ZZ 55*65*4 VAY 7.71zł/szt. 6.27 zł bez VAT 10
9 ZZ 50*60*4 VAY 6.61zł/szt. 5.37 zł bez VAT 20
10 ZZ 45*55*4 VAY 6.05zł/szt. 4.92 zł bez VAT 20
11 ZZ 40*50*4 VAY 5.39zł/szt. 4.38 zł bez VAT 17
12 ZZ 35*45*4 VAY 4.8zł/szt. 3.9 zł bez VAT 30
13 ZZ 30*40*4 VAY 4.26zł/szt. 3.46 zł bez VAT 20
14 XPA 710 CT 39.61zł/szt. 32.2 zł bez VAT lack on stock
15 UCP 202 KBF 19.7zł/szt. 16.02 zł bez VAT lack on stock
16 U298/U291 SET9 188.04zł/szt. 152.88 zł bez VAT lack on stock
17 U 64*80*8 11.8zł/szt. 9.59 zł bez VAT 2
18 U 6*10*3 2.51zł/szt. 2.04 zł bez VAT 4
19 U 45*53*10 RSB 7.55zł/szt. 6.14 zł bez VAT lack on stock
20 U 30*40*7 K21 NBR 8zł/szt. 6.5 zł bez VAT 5
21 U 180*200*14 K50 37.74zł/szt. 30.68 zł bez VAT lack on stock
22 U 16*24*5,5 NI300 8.56zł/szt. 6.96 zł bez VAT 13
23 U 140*160*14 K50 21.92zł/szt. 17.82 zł bez VAT lack on stock
24 U 140*160*14 K23 23.71zł/szt. 19.28 zł bez VAT 3
25 TR16*4*540MM 38.27zł/szt. 31.11 zł bez VAT lack on stock
26 TP 600 8M/20 156.7zł/szt. 127.4 zł bez VAT lack on stock
27 TP 15*1,5 27.56zł/szt. 22.41 zł bez VAT lack on stock
28 ST 3568 LFT 94.34zł/szt. 76.7 zł bez VAT lack on stock
29 SC07A87CS32 47.32zł/szt. 38.47 zł bez VAT lack on stock
30 SC04B19CS31PX2 46.3zł/szt. 37.64 zł bez VAT 3
31 R28-9 96.05zł/szt. 78.09 zł bez VAT 2
32 R 2-6 ZZ SS 13.47zł/szt. 10.95 zł bez VAT lack on stock
33 QJ 213 MPA C3 412.06zł/szt. 335.01 zł bez VAT lack on stock
34 PJ 1219 5.97zł/szt. 4.85 zł bez VAT lack on stock
35 OW1 115*94*8,1 15.72zł/szt. 12.78 zł bez VAT 2
36 OGNIWO 08B-3 CL 7.23zł/szt. 5.88 zł bez VAT 7
37 NU 2311 ETVP2 C3 408.34zł/szt. 331.98 zł bez VAT lack on stock
38 NJ 2210 ET C4 195.19zł/szt. 158.69 zł bez VAT 4
39 NJ 209 ETVP 101.89zł/szt. 82.84 zł bez VAT 2
40 NA 4901 CZH 11.64zł/szt. 9.46 zł bez VAT lack on stock
41 MR 16277 2RS 32zł/szt. 26.02 zł bez VAT 4
42 ŁAŃCUCH 08 B-3 76.38zł/szt. 62.1 zł bez VAT 20
43 KP 16 L100 33.86zł/szt. 27.53 zł bez VAT lack on stock
44 K 81130 SRBF 132.45zł/szt. 107.68 zł bez VAT 2
45 JL 68145/111 NAF 17.59zł/szt. 14.3 zł bez VAT lack on stock
46 HTF O 45-7 A G5 N C3 lack price lack price2 lack on stock
47 HRC 35*45*45 37.08zł/szt. 30.15 zł bez VAT 6
48 HK 3520 B 22.39zł/szt. 18.2 zł bez VAT lack on stock
49 HGY 15*21*1 0.74zł/szt. 0.6 zł bez VAT 8