我想与 excel 文件进行比较并输出到一个新的 excel 文件以导入我们愚蠢的 ERP,以改善我的工作量,但我有一些问题
我的 excel 文件像这样,大约 17 列NAME DESCRIPTION Our REF TEAM ADDRESS LINE1 ... ........
只有地址行 1 地址行 2 和 tel 需要比较'因为地址和电话将由第三方更改
excel有2个工作表,名为cust代码和供应商代码,如何将两个excel与2个工作表进行比较
#compare
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)
difference = todaySet[todaySet!=lastdaySet]
print (difference)
#Output excel
....
但有错误 输出误差
> Traceback (most recent call last): File
> "C:/Users/SupportTeamA/PycharmProjects/untitled2/excelhandel.py", line 74,
> in <module>
> difference = todaySet[todaySet != lastdaySet] File "C:UsersSupportTeamAPycharmProjectsuntitled2venvlibsite-packagespandascoreops__init__.py",
> line 838, in f
> raise ValueError( ValueError: Can only compare identically-labeled DataFrame objects
它不会比较两张纸中的行数是否不同并给出相同的错误。 当 todaySet 包含的行数多于上一天集时,下面是解决方法:
import pandas as pd
lastdaySet = pd.read_excel('lastday.xlsx',sheet_name =0,index=False,inplace=True)
todaySet = pd.read_excel('today.xlsx', sheet_name =0,index=False, inplace=True)
lenToday = len (todaySet.index)
lenLastday = len(lastdaySet.index)
if lenToday > lenLastday:
temp = todaySet.iloc[:lenLastday]
difference = temp[temp!=lastdaySet]
remaining = todaySet.iloc[lenLastday:]
difference = pd.concat([difference, remaining]) #.drop_duplicates().reset_index(drop=True)
difference = difference.reset_index(drop=True)
else:
difference = todaySet[todaySet!=lastdaySet]
print ('Final Diff:')
print (difference)
要比较 Sheet2,您需要设置sheet_name = 1
。