我的问题是,我有一个这样的数据帧:
<表类>
B
tbody><<tr>1 b1 c3 d1 2 b2 d2 表类>
您可以创建一个掩码,仅保留B为1
或2
的行,第二部分仅保留至少有一行为1
和2
的组。如果你想用np.logical_and.reduce
来调整,可以节省一些书写。
import numpy as np
mask = (df['B'].isin([1, 2])
& df['B'].eq(1).groupby(df['A']).transform('any')
& df['B'].eq(2).groupby(df['A']).transform('any'))
df[mask]
A B
0 a 1
1 b 1
3 d 1
4 a 2
5 b 2
6 d 2
更可伸缩,只需添加值到列表:
import numpy as np
vals = [1, 2]
mask = (df['B'].isin(vals)
& np.logical_and.reduce([df['B'].eq(val).groupby(df['A']).transform('any')
for val in vals]))
已经有很多很棒的答案了,下面是我解决问题的方法
import pandas as pd
# Setup
A = ["a", "b", "c", "d", "a", "b", "d"]
B = [1, 1, 3, 1, 2, 2, 2]
df = pd.DataFrame({"A": A, "B": B})
# Keep only values less than or equal than 2, representing month 1 and 2
filter = df[df["B"] <= 2]
# Sort the values by column A and B
sort = df.sort_values(by=["A", "B"])
# Group them and count number of appeareances
groupby_count = sort.groupby(["A"], as_index=False).agg(count=("A", "count"))
# Only keep appearances equal to 2, as that would imply appearances of month 1 and 2
print(groupby_count[groupby_count["count"] == 2])
输出:
A count
0 a 2
1 b 2
3 d 2
我相信你可以使用groupby.transform
和loc
,就好像我让你的请求很简单:
import pandas as pd
res = (df.loc[df.groupby('A')['B'].transform('size')>=2]).sort_values(by='A')
A B
0 a 1
4 a 2
1 b 1
5 b 2
3 d 1
6 d 2