我的第一个数据帧如下:
d = {'Name' : pd.Series(['A', 'A', 'A','B', 'B', 'B', 'C', 'D', 'D']),
'ID' : pd.Series(['10', '11', '12','13',
'10', '12','14','10','12']),
'Jan' : pd.Series([100,200,300,200,400,600,20,50,100]),
'Feb' : pd.Series ([100,150,200,100,200,300,200,50,100]),
'Mar' : pd.Series([100,200,300,200,400,400,20,40,50]),
'Apr' : pd.Series([10,250,200,100,200,400,210,105,100]),
'May' : pd.Series([100,100,100,200,200,300,200,60,50]),
'Jun' : pd.Series([100,200,200,200,100,200,200,410,50]),
'Jul' : pd.Series([50,100,240,200,200,450,200,410,50]),
'Aug' : pd.Series([100,100,200,100,200,100,120,140,150]),
'Sep' : pd.Series([100,100,200,100,200,100,100,140,150]),
'YES' : pd.Series(['Y','N','Y','N','Y','Y','N','N','N'])}
newdf = pd.DataFrame(d)
我的第二个数据帧如下:
d = {'ID' : pd.Series(['10', '11', '12','13', '14','17','20','21','50']),
'Jan' : pd.Series([10,20,30,20,40,60,2,5,10]),
'Feb' : pd.Series([10,15,20,10,20,30,20,5,10]),
'Mar' : pd.Series([10,20,30,20,40,40,2,4,5]),
'Apr' : pd.Series([15,10,20,10,10,10,12,14,15]),
'May' : pd.Series([10,25,20,10,20,40,21,15,10]),
'Jun' : pd.Series([15,10,20,10,10,10,12,14,15]),
'Jul' : pd.Series([2,2,2,0,2,2,3,2,2]),
'Aug' : pd.Series( [2,2,2,20,2,1,2,2,2]),
'Sep' : pd.Series( [3,0,2,2,2,1,2,5,2]),
'YES' : pd.Series(['Y','N','Y','N','Y','','','NS'])}
newdf1 = pd.DataFrame(d)
第二个数据帧有额外的id我需要根据匹配的id值
将第一个数据帧从1月到9月使用第二个df从1月到9月示例预期结果:(第一个数据帧的第5个索引如下)
Name ID Jan Feb Mar Apr May Jun Jul Aug Sep YES RJan RFeb RMar RApr RMay RJun RJul RAug RSep
B 12 600 300 400 400 300 200 450 100 100 Y 20. 15. 13.33 20. 15. 10. 225. 50. 50.
也是否有任何方法可以捕获输出列,如名称,ID, Jan…9月,是的,DJan . .Dsep(从newdf1划分值),RJan…Rsep?
D表示除R表示结果
您可以使用merge
来对齐数据框:
df2 = newdf[['ID']].merge(newdf1, on='ID').drop(columns=['ID', 'YES'])
out = newdf.join(newdf[df2.drop(columns='sep').columns].div(df2.drop(columns='sep')).add_prefix('R')).join(df2['sep'])
输出;
Name ID Jan Feb Mar Apr May Jun Jul Aug ... YES RJan
0 A 10 100 100 100 10 100 100 50 100 ... Y 10.000000
1 A 11 200 150 200 250 100 200 100 100 ... N 20.000000
2 A 12 300 200 300 200 100 200 240 200 ... Y 30.000000
3 B 13 200 100 200 100 200 200 200 100 ... N 10.000000
4 B 10 400 200 400 200 200 100 200 200 ... Y 13.333333
5 B 12 600 300 400 400 300 200 450 100 ... Y 20.000000
6 C 14 20 200 20 210 200 200 200 120 ... N 0.666667
7 D 10 50 50 40 105 60 410 410 140 ... N 2.500000
8 D 12 100 100 50 100 50 50 50 150 ... N 2.500000
RFeb RMar RApr RMay RJun RJul RAug sep
0 10.000000 10.000000 0.666667 10.0 6.666667 25.0 50.0 3
1 15.000000 20.000000 16.666667 10.0 13.333333 50.0 50.0 3
2 20.000000 30.000000 13.333333 10.0 13.333333 120.0 100.0 3
3 6.666667 10.000000 10.000000 8.0 20.000000 100.0 50.0 0
4 10.000000 13.333333 10.000000 10.0 5.000000 100.0 100.0 2
5 15.000000 13.333333 20.000000 15.0 10.000000 225.0 50.0 2
6 10.000000 0.666667 10.500000 10.0 10.000000 100.0 60.0 2
7 5.000000 2.000000 10.500000 6.0 41.000000 inf 7.0 2
8 5.000000 1.250000 10.000000 2.5 5.000000 25.0 75.0 2
[9 rows x 21 columns]