有什么简单的方法可以在一列中查找一个值,并在另一列中有两个选定的重复值?



我的问题是,我有一个这样的数据帧:

<表类> B tbody><<tr>1b1c3d12b2d2

您可以创建一个掩码,仅保留B为12的行,第二部分仅保留至少有一行为12的组。如果你想用np.logical_and.reduce来调整,可以节省一些书写。

import numpy as np
mask = (df['B'].isin([1, 2])
& df['B'].eq(1).groupby(df['A']).transform('any')
& df['B'].eq(2).groupby(df['A']).transform('any'))
df[mask]

A  B
0  a  1
1  b  1
3  d  1
4  a  2
5  b  2
6  d  2

更可伸缩,只需添加值到列表:

import numpy as np
vals = [1, 2]
mask = (df['B'].isin(vals)
& np.logical_and.reduce([df['B'].eq(val).groupby(df['A']).transform('any')
for val in vals]))

已经有很多很棒的答案了,下面是我解决问题的方法

import pandas as pd
# Setup
A = ["a", "b", "c", "d", "a", "b", "d"]
B = [1, 1, 3, 1, 2, 2, 2]
df = pd.DataFrame({"A": A, "B": B})
# Keep only values less than or equal than 2, representing month 1 and 2
filter = df[df["B"] <= 2]
# Sort the values by column A and B
sort = df.sort_values(by=["A", "B"])
# Group them and count number of appeareances
groupby_count = sort.groupby(["A"], as_index=False).agg(count=("A", "count"))
# Only keep appearances equal to 2, as that would imply appearances of month 1 and 2
print(groupby_count[groupby_count["count"] == 2])

输出:

A  count
0  a      2
1  b      2
3  d      2

我相信你可以使用groupby.transformloc,就好像我让你的请求很简单:

import pandas as pd
res = (df.loc[df.groupby('A')['B'].transform('size')>=2]).sort_values(by='A')
A  B
0  a  1
4  a  2
1  b  1
5  b  2
3  d  1
6  d  2

最新更新