如何为每个值查找每天的第一次和最后一次时间



我正试图找到一种方法,在每天的数据帧中找到每个资产的第一个和最后一个时间戳。例如,我有一个数据帧:

import pandas as pd
data = {
'Date':['2022-01-01','2022-01-01','2022-01-01','2022-01-01','2022-01-01','2022-01-01',
'2022-01-01' ,'2022-01-02','2022-01-02','2022-01-02','2022-01-02','2022-01-02','2022-01-02',
'2022-01-02','2022-01-02','2022-01-03','2022-01-03','2022-01-03','2022-01-03','2022-01-03',
'2022-01-03','2022-01-03','2022-01-03'],
'Time':['12:01','12:05','14:07','11:01','13:06','17:12','15:15',
'9:02','8:06','14:06','19:19','10:00','13:01','17:00','10:15',
'8:00','9:00','7:15','16:04','15:02','17:10','12:06','15:00'],
'Asset':[111,111,111,222,222,222,222,
111,111,111,111,111,222,222,222,
333,333,111,111,111,111,333,111]
}
df = pd.DataFrame(data)
df

看起来像:

Date    Time    Asset
0   2022-01-01  12:01   111
1   2022-01-01  12:05   111
2   2022-01-01  14:07   111
3   2022-01-01  11:01   222
4   2022-01-01  13:06   222
5   2022-01-01  17:12   222
6   2022-01-01  15:15   222
7   2022-01-02  9:02    111
8   2022-01-02  8:06    111
9   2022-01-02  14:06   111
10  2022-01-02  19:19   111
11  2022-01-02  10:00   111
12  2022-01-02  13:01   222
13  2022-01-02  17:00   222
14  2022-01-02  10:15   222
15  2022-01-03  8:00    333
16  2022-01-03  9:00    333
17  2022-01-03  7:15    111
18  2022-01-03  16:04   111
19  2022-01-03  15:02   111
20  2022-01-03  17:10   111
21  2022-01-03  12:06   333
22  2022-01-03  15:00   111

我想按天对这些数据进行分组,并删除每天每个资产的所有重复项,只保留每天每个值的第一个和最后一个时间戳。我的理想结果是这样的:

data1 = {
'Date':['2022-01-01','2022-01-01','2022-01-01','2022-01-01',
'2022-01-02','2022-01-02','2022-01-02','2022-01-02',
'2022-01-03','2022-01-03','2022-01-03','2022-01-03',],
'Time':['12:01','14:07','11:01','17:12',
'8:06','19:19','10:15','17:00',
'8:00','12:06','7:15','17:10'],
'Asset':[111,111,222,222,
111,111,222,222,
333,333,111,111]
}
df1 = pd.DataFrame(data1)
df1

看起来像:

Date    Time    Asset
0   2022-01-01  12:01   111
1   2022-01-01  14:07   111
2   2022-01-01  11:01   222
3   2022-01-01  17:12   222
4   2022-01-02  8:06    111
5   2022-01-02  19:19   111
6   2022-01-02  10:15   222
7   2022-01-02  17:00   222
8   2022-01-03  8:00    333
9   2022-01-03  12:06   333
10  2022-01-03  7:15    111
11  2022-01-03  17:10   111

理想情况下,我想用Python解决这个问题,但如果在R或SQL中有更简单的解决方案,我可以使用它们。任何帮助都将不胜感激!提前感谢!

import pandas as pd
data = {
'Date':['2022-01-01','2022-01-01','2022-01-01','2022-01-01','2022-01-01','2022-01-01',
'2022-01-01' ,'2022-01-02','2022-01-02','2022-01-02','2022-01-02','2022-01-02','2022-01-02',
'2022-01-02','2022-01-02','2022-01-03','2022-01-03','2022-01-03','2022-01-03','2022-01-03',
'2022-01-03','2022-01-03','2022-01-03'],
'Time':['12:01','12:05','14:07','11:01','13:06','17:12','15:15',
'9:02','8:06','14:06','19:19','10:00','13:01','17:00','10:15',
'8:00','9:00','7:15','16:04','15:02','17:10','12:06','15:00'],
'Asset':[111,111,111,222,222,222,222,
111,111,111,111,111,222,222,222,
333,333,111,111,111,111,333,111]
}
df = pd.DataFrame(data)
df_f = df.groupby(by=['Date', 'Asset']).first().reset_index()
df_l = df.groupby(by=['Date', 'Asset']).last().reset_index()
df_fl = pd.concat([df_f, df_l])[['Date', 'Time', 'Asset']]
df_fl = df_fl.sort_values(by=['Date', 'Asset', 'Time']).reset_index().drop(columns=['index'])
print(df_fl)

打印

时间>>333
索引日期资产
02022-021-0112:01111
12022-01-0114:07111
22022-01-0111:01222
32022-01-0115:15222
42022-01-0210:00111
52022-01-029:02111
62022-01-0210:15222
72022-01-0213:01222
82022-01-0315:00111
92022-01-037:15111
102022-01-0312:06333
112022-01-038:00

这里有一种方法可以实现

使用groupby获取每个日期和资产的最大和最小时间,然后堆叠并重置索引。

#convert time (in str) to time with a two digit hour and two digit minute, to get the right min/max
df['Time']=df['Time'].astype('datetime64').dt.strftime('%H:%M')
df.groupby(['Date', 'Asset'] )['Time'].agg({'max', 'min'}).stack().reset_index().rename(columns={0:'Time'}).drop(columns='level_2').sort_values(['Date','Asset','Time'])
Date  Asset   Time
1   2022-01-01  111     12:01
0   2022-01-01  111     14:07
3   2022-01-01  222     11:01
2   2022-01-01  222     17:12
5   2022-01-02  111     08:06
4   2022-01-02  111     19:19
7   2022-01-02  222     10:15
6   2022-01-02  222     17:00
9   2022-01-03  111     07:15
8   2022-01-03  111     17:10
11  2022-01-03  333     08:00
10  2022-01-03  333     12:06

相关内容

最新更新