我得到的抓取的 CSV 熊猫数据帧:值错误("值长度与索引长度不匹配")



我在代码下运行并获取:

raise ValueError('Length of values does not match length of ' 'index')

valueerror:值的长度与索引的长度

不匹配

for:

print(df3)
zz = df1.WE=np.where(df1.EW.isin(df3.AL),df1.WE,np.nan)
df3['O2'] = zz  #Error Line

我已经在CSV中可以使用该数据,该数据具有相同的列结构,除此之外,该数据被删除了数据,而其他CSV是手动创建的。

df1

                                                 EW      WE  
0                                            can v can   1.900   
1                       Lanus U20 v Argentinos Jrs U20   2.100   
                                                    DA  
0    https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
1    https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  

df2

                AA                      AB                       AC      AD  
0      Today 19:00           Indonesia U23           Kyrgyzstan U23  650.00   
1      Today 23:00   Brunei Darussalam U23             Mongolia U23    3.30   
2      Today 19:30                     can                      can  110.00     
                                                   AE  
0   https://www.betfair.com.au/exchange/plus/footb...  
1   https://www.betfair.com.au/exchange/plus/footb...  
2   https://www.betfair.com.au/exchange/plus/footb...  

df3

df3 = pd.DataFrame(columns=['DAT', 'G', 'TN', 'O1', 'L1', 'TN2', 'O2', 'L2', 'D', 'AJ', 'AK', 'AL'])

所需的:

df3:

               DAT                      G  
0      Today 19:00                 can v can   

         TN                           O1  
0   can v can                        1.900   
                         L1  
0   https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...   
        TN2                         O2  
0   can v can                       110.00

                                                   L2    D     
0   https://www.betfair.com.au/exchange/plus/footb...  108.1  

代码:

   import pandas as pd
    import time
    import numpy as np
    import pandas as pd
    print('####CURRIES###')
    df1 = pd.read_csv('C:\O\df1.csv',
                       index_col=False,
                       usecols=[0,1,2],
                       names=["EW", "WE", "DA"],
                       header=None)
    print(df1)
    print('####Mangoes###')
    df2 = pd.read_csv('C:\O\df2.csv',
                       index_col=False,
                       usecols=[0,1,2, 3, 4],
                       names=["AA", "AB", "AC", "AD", "AE"],
                       header=None)
    print(df2)
    df3 = pd.DataFrame(columns=['DAT', 'G', 'TN', 'O1', 'L1', 'TN2', 'O2', 'L2', 'D', 'AJ', 'AK', 'AL'])
    df3['DAT'] = df2['AA']




    df3['G'] = df2['AB'] + ' v ' + df2['AC']
    df3['AK'] = df2['AB'] + ' v ' + df2['AC']
    df3['AL'] = df2['AB'] + ' v ' + df2['AC']

    print(df3)

    #TN
    print('Vlookup')
    aa = df3.AK=np.where(df3.AK.isin(df1.EW),df3.AK,np.nan)
    print(aa)
    df3['TN'] = aa
    print('Vlookup output')
    print(df3)
    ## Not working
    #TN
    print('Vlookup')
    aa = df3.AK=np.where(df3.AK.isin(df1.EW),df3.AK,np.nan)
    print(aa)
    df3['TN2'] = aa
    ## Not working
    #L
    df3['L1'] = df2['AE']
    df3['L2'] = df1['DA']
    print('dataframe complete')
    print(df3)
    ##O  <-- Get value beside T


    print('done')
    #O1 Mangoes
    aa = df2.AD=np.where(df3.AL.isin(df1.EW),df2.AD,np.nan)
    df3['O1'] = aa

    #O2 Curries
    #zz = df1.WE=np.where(df3.AL.isin(df1.EW),df1.WE,np.nan)
    #zz = df1.WE=np.where(df1.EW.isin(df3.AL),df1.WE,np.nan)
    ###############ERROR B#  ###############ERROR B#
    print(df3)
    zz = df1.WE=np.where(df1.EW.isin(df3.AL),df1.WE,np.nan)
    df3['O2'] = zz

###############ERROR ABOVE#
    print('output')
    df = df3[pd.notnull(df3['TN'])]
    #df3['D'] = df3['O2'] - df3['O1']

    print('tt')
    #df3.dropna(how='all')
    #df3.columns = df3.columns.droplevel(3)
    print(df3)
    #df3['AA'] = df1[zz]
    print('zza')
    df3.pop('AK')
    df3.pop('AL')
    df3['D'] = df3['O2'].sub(df3['O1'], fill_value=0)
    df3.pop('AJ')
    print(df3.dtypes)
    NaNs = df3 = df3.dropna(subset=['O1','O2'])
    print(NaNs)
    import pandas as pd
    print('test')
    print(df3)
    writer = pd.ExcelWriter('C:\O\dx.xlsx')
    df3.to_excel(writer, startcol=0, index = False) 
    writer.save()
    print(df3)

    #df3.pop('AK')
    #df3['T1'] = df1[zz]
    #Delete Columns
    #put O in a column value
    #

刮擦代码以在此处创建数据框。

i然后写入can v can在DF1列A第1行中。

i然后在DF2列B行3

中写入CAN

i然后在DF2列C行3

中写入CAN

或简单地是我正在使用的确切文件(刮擦)。

df1

df2

上面代码下方的这些文件完美效果。我不明白为什么它不适用于刮擦数据。

工作代码

df2

df1

您需要 merge inner join

print('####CURRIES###')
df1 = pd.read_csv('C:\O\df1.csv',
                   index_col=False,
                   usecols=[0,1,2],
                   names=["EW", "WE", "DA"],
                   header=None)
print(df1.head())
####CURRIES###
                                 EW    WE  
0                         can v can  1.90   
1    Lanus U20 v Argentinos Jrs U20  2.10   
2      Botafogo RJ U20 v Toluca U20  1.83   
3  Atletico Mineiro U20 v Bahia U20  2.10   
4                 FC Porto v Monaco  1.36   
                                                  DA  
0  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
1  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
2  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
3  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
4  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  
print('####Mangoes###')
df2 = pd.read_csv('C:\O\df2.csv',
                   index_col=False,
                   usecols=[0,1,2, 3, 4],
                   names=["AA", "AB", "AC", "AD", "AE"],
                   header=None)

print(df2.head())
####Mangoes###
            AA                     AB                      AC      AD  
0  Today 19:00          Indonesia U23          Kyrgyzstan U23  650.00   
1  Today 23:00  Brunei Darussalam U23            Mongolia U23    3.30   
2  Today 19:30                    can                     can  110.00   
3  Today 20:00  FC Zbrojovka Brno U19       Sparta Prague U19    1.97   
4  Today 20:30         Rahmatgonj MFS  Sheikh Jamal Dhanmondi    1.51   
                                                  AE  
0  https://www.betfair.com.au/exchange/plus/footb...  
1  https://www.betfair.com.au/exchange/plus/footb...  
2  https://www.betfair.com.au/exchange/plus/footb...  
3  https://www.betfair.com.au/exchange/plus/footb...  
4  https://www.betfair.com.au/exchange/plus/footb...  

#crewte new column called EW in df2
df2['EW'] = df2['AB'] + ' v ' + df2['AC']   
#merge on columns EW  
df3 = pd.merge(df2, df1, on='EW')
#change columns names
df3 = df3.rename(columns={'WE':'O1','AD':'O2'})
#subtract O columns
df3['D'] = df3['O2'].sub(df3['O1'], fill_value=0)
print (df3)
            AA             AB              AC     O2  
0  Today 19:00  Indonesia U23  Kyrgyzstan U23  650.0   
1  Today 19:30            can             can  110.0   
                                                  AE  
0  https://www.betfair.com.au/exchange/plus/footb...   
1  https://www.betfair.com.au/exchange/plus/footb...   
                               EW   O1  
0  Indonesia U23 v Kyrgyzstan U23  2.0   
1                       can v can  1.9   
                                                  DA      D  
0  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  648.0  
1  https://www.bet365.com.au/#/AC/B1/C1/D13/E40/F...  108.1  

正确的语法是:

df3['O2'] = pd.Series(za)

用孤立的python脚本调查并简化了刮擦问题。