我有两个dataframe一个有不同日期的值,您可以复制第一次日期的值,如下所示
import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})
另一个具有每个日期的这些值的计数,它可以使用
复制count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})
现在第一次日期2021-08-16
有12个值,我需要将2021-08-16
的每个值除以12,并根据它们的日期和计数数与所有其他值相同。我已经尝试创建values_df
的新列,它有计数的数量,这样我就可以做values_df['val'] / values_df['count']
,但它不会重复每个日期的值。有人能指导我怎样才能做到这一点吗?谢谢你
如果想重用count
DataFrame
使用Series.map
:
values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(count.set_index('date')['count'].astype(int))
print (values_df)
date val new
0 2021-08-16 0.38956 0.032463
1 2021-08-16 0.733198 0.061100
2 2021-08-16 0.1856580 0.015471
3 2021-08-16 -0.759235 -0.063270
4 2021-08-16 0.112378 0.009365
5 2021-08-16 0.7436924 0.061974
6 2021-08-16 0.32941900 0.027452
7 2021-08-16 0.7423981 0.061867
8 2021-08-16 0.71269175 0.059391
9 2021-08-16 0.127698146 0.010642
或者如果想计数values_df
中的值,只使用Series.map
和Series.value_counts
:
values_df['new'] = values_df['val'].astype(float) / values_df['date'].map(values_df['date'].value_counts())
print (values_df)
date val new
0 2021-08-16 0.38956 0.038956
1 2021-08-16 0.733198 0.073320
2 2021-08-16 0.1856580 0.018566
3 2021-08-16 -0.759235 -0.075924
4 2021-08-16 0.112378 0.011238
5 2021-08-16 0.7436924 0.074369
6 2021-08-16 0.32941900 0.032942
7 2021-08-16 0.7423981 0.074240
8 2021-08-16 0.71269175 0.071269
9 2021-08-16 0.127698146 0.012770
可以这样使用dataframe上的join函数:
import pandas as pd
values_df = pd.DataFrame({'date': ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16','2021-08-16', '2021-08-16'],
'val': ['0.38956', '0.733198', '0.1856580', '-0.759235', '0.112378', '0.7436924' , '0.32941900', '0.7423981', '0.71269175', '0.127698146']})
count = pd.DataFrame({'date': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21', '2021-08-22', '2021-08-23','2021-08-24', '2021-08-25'],
'count': ['12', '47', '85', '65', '36', '126' , '75', '85', '175', '63']})
values_df = values_df.set_index('date').join(count.set_index('date'))
#now, you datafram values_df have a val and count value for each date
#you can create your new column like that (but you need to make sure that every val have a defined count value !=0
new_colums = [ float(values_df['val'][i]) / float(values_df['count'][i]) for i in range(len(values_df['val']))]
values_df['name of your new column (var/count)'] = new_colums
print(values_df)
显示如下:
val count name of your new column (var/count)
date
2021-08-16 0.38956 12 0.032463
2021-08-16 0.733198 12 0.061100
2021-08-16 0.1856580 12 0.015471
2021-08-16 -0.759235 12 -0.063270
2021-08-16 0.112378 12 0.009365
2021-08-16 0.7436924 12 0.061974
2021-08-16 0.32941900 12 0.027452
2021-08-16 0.7423981 12 0.061867
2021-08-16 0.71269175 12 0.059391
2021-08-16 0.127698146 12 0.010642