如何在Python中合并.csv文件中的两行结果(BeautifulSoup)



我正试图从一个网站获取数据,但我在如何处理"索引超出范围";错误或导致.csv文件中出现两行。我所说的错误";索引超出范围";在这个网站上,有些记录可能有空值,我不知道如何将正确的条件放入循环中。我用了一些向导,但没用。

my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))
uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")
headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stockn"
f.write(headers)
for container in containers:

productName = container.findAll("span", {"class":"sku"})
name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"

priceWithVAT = container.findAll("span", {"class":"price-intax"})
price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"

priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"

stock = container.findAll("p", {"class":"stock in-stock"})
stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"

f.write(name + "," + price + "," + priceNot + "," + stock + "n" + "n")

f.close()

然后在.csv文件中,我得到了整个页面的结果,每个产品都被分成两行,比如:

CORRECT,lack price,lack price2,lack on stock
lack name,CORRECT,CORRECT,CORRECT

我的预期输出:

CORRECT, CORRECT, CORRECT, CORRECT

(CORRECT意味着从网站上抓取正确的数据(

当我删除if container.findAll("span", {"class":"sku"}) else "lack name"和类似的循环,它向我显示了索引超出范围的错误,这是应该的,因为有一些空值。

你能帮我怎么更改代码吗?

需要稍微改变一下这里的逻辑。我要做的不是将每个container作为产品名称,然后获取产品信息,而是获取包含所有信息的整个容器。您会注意到,每个产品都在<ul class="products ...">标签下的<li>标签中。

因此,让我们首先获取具有以'products'开头的类的<ul>标记。然后从那里得到所有的<li>标签。然后,我们将对其中的每一个进行迭代,并提取出所需的数据。

正如您所说,有些标签不存在,所以我们将执行try/except。它将尝试获取数据,如果失败,它将默认为except异常。

此外,pandas是一个非常好和有用的库,可以使用/学习。所以我采用了这个方法,而不是像以前那样写csv文件

代码:

import requests
from bs4 import BeautifulSoup
import re
url = 'https://specjal.com/sklep/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')

rows = []
for product in products:
try:
productName = product.find('span',{'class':'sku'}).text
except:
productName = 'lack name'

try:
priceWithVAT = product.find('span',{'class':'price-intax'}).text 
except:
priceWithVAT = 'lack price'

try:
priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
except:
priceWithoutVAT = 'lack price2'

try:
stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
except:
stock = 'lack on stock'
# consider changing the above line to stock = 0

row = {
'productName':productName, 
'priceWithVAT':priceWithVAT, 
'priceWithoutVAT':priceWithoutVAT, 
'stock':stock}

rows.append(row)


df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)

输出:

print(df)
productName   priceWithVAT    priceWithoutVAT          stock
0        ZZ 90*105*4 VAY   14.86zł/szt.   12.08 zł bez VAT             10
1        ZZ 85*100*5 VAY   13.76zł/szt.   11.19 zł bez VAT             10
2         ZZ 80*95*4 VAY   12.66zł/szt.   10.29 zł bez VAT             20
3         ZZ 75*90*4 VAY   11.01zł/szt.    8.95 zł bez VAT             20
4         ZZ 70*85*4 VAY    9.91zł/szt.    8.06 zł bez VAT             20
5         ZZ 65*80*5 VAY    9.36zł/szt.    7.61 zł bez VAT             20
6         ZZ 65*80*4 VAY    9.36zł/szt.    7.61 zł bez VAT             20
7         ZZ 60*75*5 VAY    8.25zł/szt.    6.71 zł bez VAT             14
8         ZZ 55*65*4 VAY    7.71zł/szt.    6.27 zł bez VAT             10
9         ZZ 50*60*4 VAY    6.61zł/szt.    5.37 zł bez VAT             20
10        ZZ 45*55*4 VAY    6.05zł/szt.    4.92 zł bez VAT             20
11        ZZ 40*50*4 VAY    5.39zł/szt.    4.38 zł bez VAT             17
12        ZZ 35*45*4 VAY     4.8zł/szt.     3.9 zł bez VAT             30
13        ZZ 30*40*4 VAY    4.26zł/szt.    3.46 zł bez VAT             20
14            XPA 710 CT   39.61zł/szt.    32.2 zł bez VAT  lack on stock
15           UCP 202 KBF    19.7zł/szt.   16.02 zł bez VAT  lack on stock
16        U298/U291 SET9  188.04zł/szt.  152.88 zł bez VAT  lack on stock
17             U 64*80*8    11.8zł/szt.    9.59 zł bez VAT              2
18              U 6*10*3    2.51zł/szt.    2.04 zł bez VAT              4
19        U 45*53*10 RSB    7.55zł/szt.    6.14 zł bez VAT  lack on stock
20     U 30*40*7 K21 NBR       8zł/szt.     6.5 zł bez VAT              5
21      U 180*200*14 K50   37.74zł/szt.   30.68 zł bez VAT  lack on stock
22     U 16*24*5,5 NI300    8.56zł/szt.    6.96 zł bez VAT             13
23      U 140*160*14 K50   21.92zł/szt.   17.82 zł bez VAT  lack on stock
24      U 140*160*14 K23   23.71zł/szt.   19.28 zł bez VAT              3
25          TR16*4*540MM   38.27zł/szt.   31.11 zł bez VAT  lack on stock
26          TP 600 8M/20   156.7zł/szt.   127.4 zł bez VAT  lack on stock
27             TP 15*1,5   27.56zł/szt.   22.41 zł bez VAT  lack on stock
28           ST 3568 LFT   94.34zł/szt.    76.7 zł bez VAT  lack on stock
29           SC07A87CS32   47.32zł/szt.   38.47 zł bez VAT  lack on stock
30        SC04B19CS31PX2    46.3zł/szt.   37.64 zł bez VAT              3
31                 R28-9   96.05zł/szt.   78.09 zł bez VAT              2
32           R 2-6 ZZ SS   13.47zł/szt.   10.95 zł bez VAT  lack on stock
33         QJ 213 MPA C3  412.06zł/szt.  335.01 zł bez VAT  lack on stock
34               PJ 1219    5.97zł/szt.    4.85 zł bez VAT  lack on stock
35       OW1 115*94*8,1    15.72zł/szt.   12.78 zł bez VAT              2
36       OGNIWO 08B-3 CL    7.23zł/szt.    5.88 zł bez VAT              7
37      NU 2311 ETVP2 C3  408.34zł/szt.  331.98 zł bez VAT  lack on stock
38         NJ 2210 ET C4  195.19zł/szt.  158.69 zł bez VAT              4
39           NJ 209 ETVP  101.89zł/szt.   82.84 zł bez VAT              2
40           NA 4901 CZH   11.64zł/szt.    9.46 zł bez VAT  lack on stock
41          MR 16277 2RS      32zł/szt.   26.02 zł bez VAT              4
42        ŁAŃCUCH 08 B-3   76.38zł/szt.    62.1 zł bez VAT             20
43            KP 16 L100   33.86zł/szt.   27.53 zł bez VAT  lack on stock
44          K 81130 SRBF  132.45zł/szt.  107.68 zł bez VAT              2
45      JL 68145/111 NAF   17.59zł/szt.    14.3 zł bez VAT  lack on stock
46  HTF O 45-7 A G5 N C3     lack price        lack price2  lack on stock
47          HRC 35*45*45   37.08zł/szt.   30.15 zł bez VAT              6
48             HK 3520 B   22.39zł/szt.    18.2 zł bez VAT  lack on stock
49           HGY 15*21*1    0.74zł/szt.     0.6 zł bez VAT              8

相关内容

  • 没有找到相关文章

最新更新