我有一个数据表,如下所示:
ID ARRIVAL_DATE_TIME DISPOSITION_DATE
1 2021-11-07 08:35:00 2021-11-07 17:58:00
2 2021-11-07 13:16:00 2021-11-08 02:52:00
3 2021-11-07 15:12:00 2021-11-07 21:08:00
我希望能够按日期/小时和小时统计我们所在地的患者人数。我想我最终必须将这些数据转换成下面所示的格式,然后创建一个数据透视表,但我不确定如何首先转换这些数据。例如,ID 1在"2021-11-07 08:35:00"one_answers"2021-111-07 17:58:00"之间的每个日期/小时和小时都有一行。
ID DATE_HOUR_IN_ED HOUR_IN_ED
1 2021-11-07 08:00:00 8:00
1 2021-11-07 09:00:00 9:00
1 2021-11-07 10:00:00 10:00
1 2021-11-07 11:00:00 11:00
...
2 2021-11-07 13:00:00 13:00
2 2021-11-07 14:00:00 14:00
2 2021-11-07 15:00:00 15:00
....
使用to_datetime
和Series.dt.floor
移除次数,然后使用concat
和重复date_range
,最后通过构造函数创建DataFrame
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
s = pd.concat([pd.Series(r.ID,pd.date_range(r.ARRIVAL_DATE_TIME,
r.DISPOSITION_DATE, freq='H'))
for r in df.itertuples()])
df1 = pd.DataFrame({'ID':s.to_numpy(),
'DATE_HOUR_IN_ED':s.index,
'HOUR_IN_ED': s.index.strftime('%H:%M')})
print (df1)
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00
30 3 2021-11-07 21:00:00 21:00
替代解决方案:
df['ARRIVAL_DATE_TIME'] = pd.to_datetime(df['ARRIVAL_DATE_TIME']).dt.floor('H')
L = [pd.date_range(s,e, freq='H')
for s, e in df[['ARRIVAL_DATE_TIME','DISPOSITION_DATE']].to_numpy()]
df['DATE_HOUR_IN_ED'] = L
df = (df.drop(['ARRIVAL_DATE_TIME','DISPOSITION_DATE'], axis=1)
.explode('DATE_HOUR_IN_ED')
.reset_index(drop=True)
.assign(HOUR_IN_ED = lambda x: x['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')))
试试这个:
import pandas as pd
import numpy as np
df = pd.read_excel('test.xls')
df1 = (df.set_index(['ID'])
.assign(DATE_HOUR_IN_ED=lambda x: [pd.date_range(s,d, freq='H')
for s,d in zip(x.ARRIVAL_DATE_TIME, x.DISPOSITION_DATE)])
['DATE_HOUR_IN_ED'].explode()
.reset_index()
)
df1['DATE_HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.floor('H')
df1['HOUR_IN_ED'] = df1['DATE_HOUR_IN_ED'].dt.strftime('%H:%M')
print(df1)
输出:
ID DATE_HOUR_IN_ED HOUR_IN_ED
0 1 2021-11-07 08:00:00 08:00
1 1 2021-11-07 09:00:00 09:00
2 1 2021-11-07 10:00:00 10:00
3 1 2021-11-07 11:00:00 11:00
4 1 2021-11-07 12:00:00 12:00
5 1 2021-11-07 13:00:00 13:00
6 1 2021-11-07 14:00:00 14:00
7 1 2021-11-07 15:00:00 15:00
8 1 2021-11-07 16:00:00 16:00
9 1 2021-11-07 17:00:00 17:00
10 2 2021-11-07 13:00:00 13:00
11 2 2021-11-07 14:00:00 14:00
12 2 2021-11-07 15:00:00 15:00
13 2 2021-11-07 16:00:00 16:00
14 2 2021-11-07 17:00:00 17:00
15 2 2021-11-07 18:00:00 18:00
16 2 2021-11-07 19:00:00 19:00
17 2 2021-11-07 20:00:00 20:00
18 2 2021-11-07 21:00:00 21:00
19 2 2021-11-07 22:00:00 22:00
20 2 2021-11-07 23:00:00 23:00
21 2 2021-11-08 00:00:00 00:00
22 2 2021-11-08 01:00:00 01:00
23 2 2021-11-08 02:00:00 02:00
24 3 2021-11-07 15:00:00 15:00
25 3 2021-11-07 16:00:00 16:00
26 3 2021-11-07 17:00:00 17:00
27 3 2021-11-07 18:00:00 18:00
28 3 2021-11-07 19:00:00 19:00
29 3 2021-11-07 20:00:00 20:00