对 2 个条件之间的连续行值求和和连接,并使用熊猫进行过滤



我想对两个条件之间的连续行值求和。这是我的数据帧:

df = pd.DataFrame({'A': ["yes","no","no","no","no","yes","yes","no","no","no","yes","yes","no","no","no"],'B':["no","no","no","no","yes","yes","no","no","no","yes","yes","no","no","no","yes"],'C': ["da","da","da","da","da","da","da","da","da","to","to","to","to","to","to"],'D': ['play','play','play','run','run','run','play','play','play','run','run','play','run','run','play'],'E':[2,5,1,4,6,13,7,8,3,9,1,4,3,5,7]},index=[0, 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14])

事实上,当A="A"时;是";并且B=";否";,我想开始为每个D的值列添加行值,直到A="0";否";并且B=";是的";。这应该由列C过滤。此外,我想连接列A。我想得到以下结果:

df1 = pd.DataFrame({'A': ["yes","yes","yes"],'B':["no","no","no"],'C':["da","da","to"],'play':[8,18,4],'run':[4,0,8],'concatA':["yes/no/no/no","yes/no/no","yes/no/no"]},index=[0, 6,11])

如果你想用pandas的方式(没有循环(,它可能不是世界上看到的最干净的代码。

为了清晰起见,我将其分解为较小的操作(可以用较小数量的操作来完成(:

df['state_change'] = (df['A'] == 'yes') & (df['B'] == 'no')
df['state_change_end'] = (df['A'] == 'no') & (df['B'] == 'yes')
df['group'] = df['state_change'].cumsum()
df['group_end'] = df['state_change_end'].cumsum()
#Filtering for summing
df['play'] = df['E'] * (df['D'] == 'play')
df['run'] = df['E'] * (df['D'] == 'run')
#Filtering out the rows, that shouldn't be summed
df_filtered = df[df['group']> df['group_end']].copy()
#Reseting index, to use it later
df_filtered['index'] = df_filtered.index
df_gb = df_filtered.groupby(df_filtered['group'])
df1 = df_gb.first()
df1['play'] = df_gb['play'].sum()
df1['run'] = df_gb['run'].sum()
df1['concatA'] = df_gb['A'].agg(lambda x : '/'.join(x))
df1.set_index('index').rename_axis(None)
df1[['A', 'B', 'C', 'play', 'run', 'concatA']]

产生所需结果:

A   B   C   play    run concatA
0   yes no  da  8       4   yes/no/no/no
6   yes no  da  18      0   yes/no/no
11  yes no  to  4       8   yes/no/no

最新更新