我有一个数据框架,看起来像这样:
<表类>
id
看过
年
月
天
dayname
tbody><<tr>f907942e330ac3653f8a9bd655770872 2021-06-02 16:34:56 2021 6 1 周一 042 b60106231fa8a8e43dd750432d5bc2021-06-02 16:13:29 2021 6 1 周一 表类>
您可以使用pd.to_datetime
+dt.normalize()
尝试按id
和列seen
的日期(不含时间)进行分组,并使用GroupBy.first()
获得每个组的第一个条目,如下所示:
# Optionally convert to datetime if not already in datetime format
df['seen'] = pd.to_datetime(df['seen'])
df.groupby(['id', df['seen'].dt.normalize()], as_index=False, sort=False).first()
数据输入:
(为更全面的测试添加了一些行):
df
id seen year month day dayname
0 f907942e330ac3653f8a9bd655770872 2021-06-02 16:34:56 2021 6 2 Monday
1 f907942e330ac3653f8a9bd655770872 2021-06-02 17:54:56 2021 6 2 Monday
2 042b60106231fa8a8e43dd750432d5bc 2021-06-02 16:13:29 2021 6 2 Monday
3 f907942e330ac3653f8a9bd655770872 2021-06-04 16:22:56 2021 6 4 Wednesday
4 f907942e330ac3653f8a9bd655770872 2021-06-04 17:43:56 2021 6 4 Wednesday
输出:
id seen year month day dayname
0 f907942e330ac3653f8a9bd655770872 2021-06-02 16:34:56 2021 6 2 Monday
1 042b60106231fa8a8e43dd750432d5bc 2021-06-02 16:13:29 2021 6 2 Monday
2 f907942e330ac3653f8a9bd655770872 2021-06-04 16:22:56 2021 6 4 Wednesday
您也可以尝试:
#Your Data frame:
df=pd.DataFrame({'id':['f907942e330ac3653f8a9bd655770872','042b60106231fa8a8e43dd750432d5bc'],
'seen':['2021-06-02 16:34:56','2021-06-02 16:13:29'],
'year':['2021','2021'],
'month':[6,6],'day':[1,1],'dayname':['Monday','Monday']})
#使用drop_duplicates
df_nodups=df.drop_duplicates(subset=['id','year','month','day'])