我在解决以下问题时遇到了麻烦:
我有一个熊猫DataFrame类似于:
col1 col2 col3
A P S
A C W
A C W
A C M
B P S
B C M
B C M
D P W
D C M
D C S
我需要回答以下问题:
在同一组(来自col1
(中,有多少个C
(在col2
中(是W
(来自col3
(,并且有多少P
(在col2
中(是S
(来自col3
(?然后,我需要把每组的所有计数加起来。
更简单地说,有多少对C-W
在同一个col1
组中,其中有一对P-S
?然后,将所有计数相加。
到目前为止,我试图解决这个问题,如下所示:
pd_h = pd_h
.groupby(["col1", "col2", "col3"])
.filter(
lambda x: ((x["col2"].iloc[0] == 'C') & (x["col3"].iloc[0] == "W")
if ((x["col2"].iloc[0] == 'P') & (x["col3"].iloc[0] == "S")) else False)
).describe(["count"]) # here describe, because I may decide to put also "mean", "std", etc.
但是剩余的对CCD_ 13可以属于其中不存在对P-S
的组。
我该如何解决这个问题?
提前感谢!
使用:
print (df)
col1 col2 col3
0 A P S
1 A C W
2 A C W
3 A C M
4 B P S
5 B C M
6 B C M
7 D P W
8 D C W <- changed to W for see not match D because not P, S
9 D C S
#first condition, second condition
m1 = (df["col2"] == 'C') & (df["col3"] == "W")
m2 = (df["col2"] == 'P') & (df["col3"] == "S")
#check if at least one True in m2 per groups
m3 = df['col1'].isin(df.loc[m2, 'col1'])
#details
print (df.assign(m1 = m1, m2=m2, m3=m3,
m1_m3 = (m1 & m3), num = (m1 & m3).view('i1')))
col1 col2 col3 m1 m2 m3 m1_m3 num
0 A P S False True True False 0
1 A C W True False True True 1
2 A C W True False True True 1
3 A C M False False True False 0
4 B P S False True True False 0
5 B C M False False True False 0
6 B C M False False True False 0
7 D P W False False False False 0
8 D C W True False False False 0
9 D C S False False False False 0
#filter only groups if m1 and m3 with aggregate sum for count
df = (m1 & m3).view('i1').groupby(df['col1']).sum().reset_index(name='count')
print (df)
col1 count
0 A 2
1 B 0
2 D 0
df['is_CW']=[(x=='C' and y=='W') for x, y in zip(df.col2, df.col3)]
df['is_PS']=[(x=='P' and y=='S') for x, y in zip(df.col2, df.col3)]
pv=df.pivot_table(values=['is_CW','is_PS'], index = 'col1', aggfunc=['sum'])
pv.columns=['is_CW','is_PS']
pv['answer'] = pv['is_CW'] * np.sign(pv['is_PS'])
print (pv.index, pv.answer)
A 2.0
B 0.0
D 0.0