简化情况:
我有一个包含一些国家列表的文件,我将其加载到数据帧df中。然后,我在许多.xls文件中获得了有关这些国家(以及更多)的数据。我尝试将这些文件中的每一个读取到df_f,对我感兴趣的数据进行子集处理,然后从原始文件中找到国家,如果其中任何一个存在,则将数据复制到dataframe df。
问题是只有部分值被正确分配。它们中的大多数以NaN的形式插入。(见下文)
for filename in os.listdir(os.getcwd()):
df_f = pd.read_excel(filename, sheetname = 'Data', parse_cols = "D,F,H,J:BS", skiprows = 2, skip_footer = 2)
df_f = df_f.fillna(0)
df_ss = [SUBSETTING df_f here]
countries = df_ss['Country']
for c in countries:
if (c in df['Country'].values):
row_idx = df[df['Country'] == c].index
df_h = df_ss[quarters][df_ss.Country == c]
df.loc[row_idx, quarters] = df_h
我得到的结果是:
Country Q1 2000 Q2 2000 Q3 2000 Q4 2000 Q1 2001 Q2 2001 Q3 2001
0 Albania NaN NaN NaN NaN NaN NaN NaN
1 Algeria NaN NaN NaN NaN NaN NaN NaN
2 Argentina NaN NaN NaN NaN NaN NaN NaN
3 Armenia NaN NaN NaN NaN NaN NaN NaN
4 Australia NaN NaN NaN NaN NaN NaN NaN
5 Austria 4547431 5155839 5558963 6079089 6326217 6483130 6547780
6 Azerbaijan NaN NaN NaN NaN NaN NaN NaN
etc...
加载和子设置正确,数据没有损坏——我为每次迭代打印df_h,它显示规则数字。关键是,在将它们分配给df数据帧后,它们就变成了NaN。。。
知道吗?
编辑:样本数据
df:
Country Country group Population Development coefficient Q1 2000
0 Albania group II 2981000 -1 0
1 Algeria group I 39106000 -1 0
2 Argentina group III 42669000 -1 0
3 Armenia group II 3013000 -1 0
4 Australia group IV 23520000 -1 0
5 Austria group IV 8531000 -1 0
6 Azerbaijan group II 9538000 -1 0
7 Bangladesh group I 158513000 -1 0
8 Belarus group III 9470000 -1 0
9 Belgium group III 11200000 -1 0
(...)
Q2 2013 Q3 2013 Q4 2013 Q1 2014 Q2 2014 Q3 2014 Q4 2014 Q1 2015
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
和其中一个文件的df_ss:
Country Q1 2000 Q2 2000 Q3 2000 Q4 2000 Q1 2001
5 Guam 11257 17155 23063 29150 37098
10 Kiribati 323 342 361 380 398
15 Marshall Islands 425 428 433 440 449
17 Micronesia 0 0 0 0 0
19 Nauru 0 0 0 0 0
22 Northern Mariana Islands 2560 3386 4499 6000 8037
27 Palau 1513 1672 1828 1980 2130
(...)
Q3 2013 Q4 2013 Q1 2014 Q2 2014 Q3 2014 Q4 2014 Q1 2015
5 150028 151152 152244 153283 154310 155333 156341
10 19933 20315 20678 21010 21329 21637 21932
15 17536 19160 20827 22508 24253 26057 27904
17 18646 17939 17513 17232 17150 17233 17438
19 7894 8061 8227 8388 8550 8712 8874
22 27915 28198 28481 28753 29028 29304 29578
27 17602 17858 18105 18337 18564 18785 19001
尝试设置如下值(请参阅本文):
df.ix[quaters,...] = 10
作者@joris:
你能试试吗df.loc[row_idx, quarters] = df_h.values
对于最后一行(注意末尾的额外.values
)?
这一次成功了,谢谢:-)