插入数据帧的数字数据变为NaNs



简化情况:

我有一个包含一些国家列表的文件,我将其加载到数据帧df中。然后,我在许多.xls文件中获得了有关这些国家(以及更多)的数据。我尝试将这些文件中的每一个读取到df_f,对我感兴趣的数据进行子集处理,然后从原始文件中找到国家,如果其中任何一个存在,则将数据复制到dataframe df。

问题是只有部分值被正确分配。它们中的大多数以NaN的形式插入。(见下文)

for filename in os.listdir(os.getcwd()):
    df_f = pd.read_excel(filename, sheetname = 'Data', parse_cols = "D,F,H,J:BS", skiprows = 2, skip_footer = 2)
    df_f = df_f.fillna(0)
    df_ss = [SUBSETTING df_f here]
    countries = df_ss['Country']
    for c in countries:
        if (c in df['Country'].values):
            row_idx = df[df['Country'] == c].index
            df_h = df_ss[quarters][df_ss.Country == c]
            df.loc[row_idx, quarters] = df_h

我得到的结果是:

Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  Q2 2001  Q3 2001  
0     Albania      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1     Algeria      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2   Argentina      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
3     Armenia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
4   Australia      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
5     Austria  4547431  5155839  5558963  6079089  6326217  6483130  6547780   
6  Azerbaijan      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
etc...

加载和子设置正确,数据没有损坏——我为每次迭代打印df_h,它显示规则数字。关键是,在将它们分配给df数据帧后,它们就变成了NaN。。。

知道吗?

编辑:样本数据

df:
          Country Country group  Population  Development coefficient  Q1 2000  
0     Albania      group II     2981000                       -1        0   
1     Algeria       group I    39106000                       -1        0   
2   Argentina     group III    42669000                       -1        0   
3     Armenia      group II     3013000                       -1        0   
4   Australia      group IV    23520000                       -1        0   
5     Austria      group IV     8531000                       -1        0   
6  Azerbaijan      group II     9538000                       -1        0   
7  Bangladesh       group I   158513000                       -1        0   
8     Belarus     group III     9470000                       -1        0   
9     Belgium     group III    11200000                       -1        0   
 (...)
   Q2 2013  Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
0        0        0        0        0        0        0        0        0  
1        0        0        0        0        0        0        0        0  
2        0        0        0        0        0        0        0        0  
3        0        0        0        0        0        0        0        0  
4        0        0        0        0        0        0        0        0  
5        0        0        0        0        0        0        0        0  
6        0        0        0        0        0        0        0        0  
7        0        0        0        0        0        0        0        0  
8        0        0        0        0        0        0        0        0  
9        0        0        0        0        0        0        0        0

和其中一个文件的df_ss:

    Country  Q1 2000  Q2 2000  Q3 2000  Q4 2000  Q1 2001  
5                       Guam    11257    17155    23063    29150    37098   
10                  Kiribati      323      342      361      380      398   
15          Marshall Islands      425      428      433      440      449   
17                Micronesia        0        0        0        0        0   
19                     Nauru        0        0        0        0        0   
22  Northern Mariana Islands     2560     3386     4499     6000     8037   
27                     Palau     1513     1672     1828     1980     2130   
(...) 
    Q3 2013  Q4 2013  Q1 2014  Q2 2014  Q3 2014  Q4 2014  Q1 2015  
5    150028   151152   152244   153283   154310   155333   156341  
10    19933    20315    20678    21010    21329    21637    21932  
15    17536    19160    20827    22508    24253    26057    27904  
17    18646    17939    17513    17232    17150    17233    17438  
19     7894     8061     8227     8388     8550     8712     8874  
22    27915    28198    28481    28753    29028    29304    29578  
27    17602    17858    18105    18337    18564    18785    19001  

尝试设置如下值(请参阅本文):

df.ix[quaters,...] = 10

作者@joris:

你能试试吗df.loc[row_idx, quarters] = df_h.values对于最后一行(注意末尾的额外.values)?

这一次成功了,谢谢:-)

最新更新