按maxdate和groupby熊猫进行过滤

我想使用这个数据帧

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})

对串行进行分组，并仅保留max(Day)的行，即以下是我的预期输出:

<表类>串行天其他tbody><<tr>A101.01.2022A101.01.2022bB101.01.2020f

这是一种方法

# convert the date to the YMD format for finding max
df['Day2']=pd.to_datetime(df['Day'], dayfirst=True)

# group on Serial, and return the max value against all rows of grouped result
# compare and filter where max date matches the date in DF
out=df.loc[df['Day2'].eq(df.groupby('Serial')['Day2'].transform(max))].drop(columns='Day2')
out

Serial  Day     Else
0   A1  01.01.2022  a
1   A1  01.01.2022  b
5   B1  01.01.2020  f

根据这个答案，您应该首先获得日期为最大值的所有索引。然后可以在数据框架上使用索引。像这样的

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})
df['Day'] = pd.to_datetime(df['Day'], format="%d.%m.%Y")
idx = df.groupby(['Serial'])['Day'].transform(max) == df['Day']
print(df[idx])

结果如下

Serial        Day Else
0     A1 2022-01-01    a
1     A1 2022-01-01    b
5     B1 2020-01-01    f

相关内容

最新更新

热门标签：