>标题可能有点令人困惑,所以这里有一个例子:
从:
id | timestamp
1 | 2015-12-02 00:00:00
1 | 2015-12-03 00:00:00 <--- latest for id 1
2 | 2015-12-02 00:00:00
2 | 2015-12-04 00:00:00
2 | 2015-12-06 00:00:00 <--- latest for id 2
对此:
id | timestamp
1 | 2015-12-03 00:00:00
2 | 2015-12-06 00:00:00
使用 nth
In [599]: df.groupby('id', as_index=False).nth(-1)
Out[599]:
id timestamp
1 1 2015-12-03 00:00:00
4 2 2015-12-06 00:00:00
理想情况下,max
因为您需要最新的日期。
In [601]: df.groupby('id', as_index=False).max()
Out[601]:
id timestamp
0 1 2015-12-03 00:00:00
1 2 2015-12-06 00:00:00
另外,tail
如评论中所述
In [602]: df.groupby('id').tail(1)
Out[602]:
id timestamp
1 1 2015-12-03 00:00:00
4 2 2015-12-06 00:00:00