如何在python中划分多行值与匹配ID行值?



我的第一个数据帧如下:

d = {'Name' : pd.Series(['A', 'A', 'A','B', 'B', 'B', 'C', 'D', 'D']), 
'ID' : pd.Series(['10', '11', '12','13', 
'10', '12','14','10','12']),
'Jan' : pd.Series([100,200,300,200,400,600,20,50,100]),
'Feb' : pd.Series ([100,150,200,100,200,300,200,50,100]),
'Mar' : pd.Series([100,200,300,200,400,400,20,40,50]),
'Apr' : pd.Series([10,250,200,100,200,400,210,105,100]),
'May' : pd.Series([100,100,100,200,200,300,200,60,50]),
'Jun' : pd.Series([100,200,200,200,100,200,200,410,50]),
'Jul' : pd.Series([50,100,240,200,200,450,200,410,50]),
'Aug' : pd.Series([100,100,200,100,200,100,120,140,150]),
'Sep' :  pd.Series([100,100,200,100,200,100,100,140,150]),
'YES' : pd.Series(['Y','N','Y','N','Y','Y','N','N','N'])}
newdf = pd.DataFrame(d) 

我的第二个数据帧如下:

d = {'ID' : pd.Series(['10', '11', '12','13', '14','17','20','21','50']), 
'Jan' : pd.Series([10,20,30,20,40,60,2,5,10]),
'Feb' : pd.Series([10,15,20,10,20,30,20,5,10]),
'Mar' : pd.Series([10,20,30,20,40,40,2,4,5]),
'Apr' : pd.Series([15,10,20,10,10,10,12,14,15]),   
'May' : pd.Series([10,25,20,10,20,40,21,15,10]),
'Jun' : pd.Series([15,10,20,10,10,10,12,14,15]),
'Jul' : pd.Series([2,2,2,0,2,2,3,2,2]),
'Aug' : pd.Series( [2,2,2,20,2,1,2,2,2]),
'Sep' : pd.Series( [3,0,2,2,2,1,2,5,2]),   
'YES' : pd.Series(['Y','N','Y','N','Y','','','NS'])}
newdf1 = pd.DataFrame(d) 

第二个数据帧有额外的id我需要根据匹配的id值

将第一个数据帧从1月到9月使用第二个df从1月到9月示例预期结果:(第一个数据帧的第5个索引如下)

Name    ID  Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     YES      RJan   RFeb    RMar    RApr    RMay    RJun    RJul    RAug    RSep
B      12   600     300     400     400     300     200     450     100     100     Y       20.       15.    13.33  20.    15.      10.     225.      50.      50.

也是否有任何方法可以捕获输出列,如名称,ID, Jan…9月,是的,DJan . .Dsep(从newdf1划分值),RJan…Rsep?

D表示除R表示结果

您可以使用merge来对齐数据框:

df2 = newdf[['ID']].merge(newdf1, on='ID').drop(columns=['ID', 'YES'])
out = newdf.join(newdf[df2.drop(columns='sep').columns].div(df2.drop(columns='sep')).add_prefix('R')).join(df2['sep'])

输出;

Name  ID  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  ...  YES       RJan  
0    A  10  100  100  100   10  100  100   50  100  ...    Y  10.000000   
1    A  11  200  150  200  250  100  200  100  100  ...    N  20.000000   
2    A  12  300  200  300  200  100  200  240  200  ...    Y  30.000000   
3    B  13  200  100  200  100  200  200  200  100  ...    N  10.000000   
4    B  10  400  200  400  200  200  100  200  200  ...    Y  13.333333   
5    B  12  600  300  400  400  300  200  450  100  ...    Y  20.000000   
6    C  14   20  200   20  210  200  200  200  120  ...    N   0.666667   
7    D  10   50   50   40  105   60  410  410  140  ...    N   2.500000   
8    D  12  100  100   50  100   50   50   50  150  ...    N   2.500000   
RFeb       RMar       RApr  RMay       RJun   RJul   RAug  sep  
0  10.000000  10.000000   0.666667  10.0   6.666667   25.0   50.0    3  
1  15.000000  20.000000  16.666667  10.0  13.333333   50.0   50.0    3  
2  20.000000  30.000000  13.333333  10.0  13.333333  120.0  100.0    3  
3   6.666667  10.000000  10.000000   8.0  20.000000  100.0   50.0    0  
4  10.000000  13.333333  10.000000  10.0   5.000000  100.0  100.0    2  
5  15.000000  13.333333  20.000000  15.0  10.000000  225.0   50.0    2  
6  10.000000   0.666667  10.500000  10.0  10.000000  100.0   60.0    2  
7   5.000000   2.000000  10.500000   6.0  41.000000    inf    7.0    2  
8   5.000000   1.250000  10.000000   2.5   5.000000   25.0   75.0    2  
[9 rows x 21 columns]

最新更新