熊猫饲养错误索引包含重复条目,无法整形



试图将数据附加到数据帧,但错误引发

索引包含重复条目

我想要csv文件中的数据,基本上他们在以下行中向我显示错误:

df = df.pivot(index=['v1','v2','v3'], columns='image', values='link').reset_index().fillna('')

代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True)[:1]:
comp=baseurl+link['href']
productlinks.append(comp)
data=[]    
u=[]
k=[]
w=[]
n=[]   
for link in productlinks:
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.find('div',class_='Product__SlideshowNavScroller')
for e,pro in enumerate(up):
t=pro.find('img').get('src')
data.append({'image':'Image '+str(e)+' UI','link':t})
dup = soup.find_all('div',class_='OptionSelector list-options')
for ro in dup:
m=[k.text.strip() for k in ro.find_all('button')]
variant1=m[0]
variant2=m[1]
variant3=m[2]
data.append({'image':'Image '+str(e)+' UI','link':t,'v1':variant1,'v2':variant2,'v3':variant3})
df = pd.DataFrame(data)
df.image=pd.Categorical(df.image,categories=df.image.unique(),ordered=True)
df = df.pivot(index=['v1','v2','v3'], columns='image', values='link').reset_index().fillna('')
df.to_csv('yt.csv')
```

会发生什么

重复的一个原因是您将信息附加到data,您可能不需要:

data.append({'image':'Image '+str(e)+' UI','link':t})

因为你还附加了:

data.append({'image':'Image '+str(e)+'UI','link':t,'v1':variant1,'v2':variant2,'v3':variant3}) 

但主要的问题是您正在使用index=['v1','v2','v3']来构建索引,但这并不是一个仍然是unique的组合。

如何修复

  1. 跳过此行data.append({'image':'Image '+str(e)+' UI','link':t})

  2. 还添加产品data.append({'id':t.split('=')[-1],'image':'Image '+str(e)+' UI','link':t,'v1':variant1,'v2':variant2,'v3':variant3})id

  3. 使用id构建唯一索引index=['id','v1','v2','v3']

示例

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True)[:1]:
comp=baseurl+link['href']
productlinks.append(comp)
data=[]
for link in productlinks:
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.find('div',class_='Product__SlideshowNavScroller')
for e,pro in enumerate(up):
t=pro.find('img').get('src')
dup = soup.find_all('div',class_='OptionSelector list-options')
for ro in dup:
m=[k.text.strip() for k in ro.find_all('button')]
variant1=m[0]
variant2=m[1]
variant3=m[2]
data.append({'id':t.split('=')[-1],'image':'Image '+str(e)+' UI','link':t,'v1':variant1,'v2':variant2,'v3':variant3})
df = pd.DataFrame(data)
df.image=pd.Categorical(df.image,categories=df.image.unique(),ordered=True)
df = df.pivot(index=['id','v1','v2','v3'], columns='image', values='link').reset_index().fillna('')
df

`

最新更新