将Dataframe中具有相同日期的值除以每个日期中的值的计数

  • 本文关键字:日期 Dataframe python pandas
  • 更新时间 :
  • 英文 :


我有两个dataframe一个有不同日期的值,您可以复制第一次日期的值,如下所示

import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'], 
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})

另一个具有每个日期的这些值的计数,它可以使用

复制
count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'], 
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})

现在第一次日期2021-08-16有12个值,我需要将2021-08-16的每个值除以12,并根据它们的日期和计数数与所有其他值相同。我已经尝试创建values_df的新列,它有计数的数量,这样我就可以做values_df['val'] / values_df['count'],但它不会重复每个日期的值。有人能指导我怎样才能做到这一点吗?谢谢你

如果想重用countDataFrame使用Series.map:

values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(count.set_index('date')['count'].astype(int))
print (values_df)
date          val       new
0  2021-08-16      0.38956  0.032463
1  2021-08-16     0.733198  0.061100
2  2021-08-16    0.1856580  0.015471
3  2021-08-16    -0.759235 -0.063270
4  2021-08-16     0.112378  0.009365
5  2021-08-16    0.7436924  0.061974
6  2021-08-16   0.32941900  0.027452
7  2021-08-16    0.7423981  0.061867
8  2021-08-16   0.71269175  0.059391
9  2021-08-16  0.127698146  0.010642

或者如果想计数values_df中的值,只使用Series.mapSeries.value_counts:

values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(values_df['date'].value_counts())
print (values_df)
date          val       new
0  2021-08-16      0.38956  0.038956
1  2021-08-16     0.733198  0.073320
2  2021-08-16    0.1856580  0.018566
3  2021-08-16    -0.759235 -0.075924
4  2021-08-16     0.112378  0.011238
5  2021-08-16    0.7436924  0.074369
6  2021-08-16   0.32941900  0.032942
7  2021-08-16    0.7423981  0.074240
8  2021-08-16   0.71269175  0.071269
9  2021-08-16  0.127698146  0.012770

可以这样使用dataframe上的join函数:

import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})
count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})

values_df = values_df.set_index('date').join(count.set_index('date'))
#now, you datafram values_df have a val and count value for each date
#you can create your new column like that (but you need to make sure that every val have a defined count value !=0
new_colums = [ float(values_df['val'][i]) / float(values_df['count'][i]) for i in range(len(values_df['val']))]
values_df['name of your new column (var/count)'] = new_colums
print(values_df)

显示如下:

val count  name of your new column (var/count)
date                                                              
2021-08-16      0.38956    12                             0.032463
2021-08-16     0.733198    12                             0.061100
2021-08-16    0.1856580    12                             0.015471
2021-08-16    -0.759235    12                            -0.063270
2021-08-16     0.112378    12                             0.009365
2021-08-16    0.7436924    12                             0.061974
2021-08-16   0.32941900    12                             0.027452
2021-08-16    0.7423981    12                             0.061867
2021-08-16   0.71269175    12                             0.059391
2021-08-16  0.127698146    12                             0.010642

最新更新