>我有一列熊猫日期时间64类型元素
df['time']
0 2019-10-04 12:03:53+00:00
1 2019-10-04 11:21:23+00:00
2 2019-10-04 12:23:11+00:00
3 2019-10-04 18:04:52+00:00
4 2019-10-04 12:22:21+00:00
...
2889974 2019-10-11 10:53:19+00:00
2889975 2019-10-11 10:58:38+00:00
2889976 2019-10-10 10:36:47+00:00
2889977 2019-10-10 10:36:47+00:00
2889978 2019-07-08 04:36:45+00:00
Name: time, Length: 2889979, dtype: datetime64[ns, UTC]
和一列相应的时间戳,如下所示df['time_full']
;
df['time_full']
0 12:03:53
1 11:21:23
2 12:23:11
3 18:04:52
4 12:22:21
...
2889974 10:53:19
2889975 10:58:38
2889976 10:36:47
2889977 10:36:47
2889978 04:36:45
Name: time_full, Length: 2889979, dtype: object
我想在一天中创建 30 分钟的插槽(基本上是 48 个插槽(,并为df['time']
列中的所有值分配一个插槽。基本上,创建一个时间戳的分类变量。像这样的东西(只是一个例子(:
df['time'] df['slot']
0 2019-10-04 12:03:53+00:00 4
1 2019-10-04 11:21:23+00:00 2
2 2019-10-04 12:23:11+00:00 32
3 2019-10-04 18:04:52+00:00 40
4 2019-10-04 12:22:21+00:00 5
我尝试使用 Pandas 的 pd.cut(( 方法对插槽进行分箱,如下所示,最终这样做:
pd.cut(df['time'].astype(np.int64)//10**9,
bins=pd.date_range("00:00", "23:59", freq="30min"))
但是得到的输出看起来像:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
2889974 NaN
2889975 NaN
2889976 NaN
2889977 NaN
2889978 NaN
Name: time, Length: 2889979, dtype: category
Categories (47, interval[int64]): [(1575331200000000000, 1575333000000000000] < (1575333000000000000, 1575334800000000000] < (1575334800000000000, 1575336600000000000] < (1575336600000000000, 1575338400000000000] ... (1575408600000000000, 1575410400000000000] < (1575410400000000000, 1575412200000000000] < (1575412200000000000, 1575414000000000000] < (1575414000000000000, 1575415800000000000]]
我也尝试使用df['time_full']
作为垃圾箱,但它抛出了一个错误,因为它是一个字符串列表。我认为问题是 df['time'] 在同时具有日期和时间时不适合分箱,但我不太确定。任何帮助将不胜感激。
如果您希望插槽范围从 0 到 47,您可以使用:
df['slots'] = df['time'].apply(lambda x: x.hour*2 if x.minute <= 29 else x.hour*2+1)
df
time slots
0 2019-10-04 12:03:53+00:00 24
1 2019-10-04 11:21:23+00:00 22
2 2019-10-04 12:23:11+00:00 24
3 2019-10-04 18:04:52+00:00 36
4 2019-10-04 12:22:21+00:00 24
2889974 2019-10-11 10:53:19+00:00 21
2889975 2019-10-11 10:58:38+00:00 21
2889976 2019-10-10 10:36:47+00:00 21
2889977 2019-10-10 10:36:47+00:00 21
2889978 2019-07-08 04:36:45+00:00 9
进一步测试:
date slots
0 2019-10-04 00:00:00 0
1 2019-10-04 00:30:00 1
2 2019-10-04 01:00:00 2
3 2019-10-04 01:30:00 3
4 2019-10-04 02:00:00 4
5 2019-10-04 02:30:00 5
6 2019-10-04 03:00:00 6
7 2019-10-04 03:30:00 7
8 2019-10-04 04:00:00 8
9 2019-10-04 04:30:00 9
10 2019-10-04 05:00:00 10
11 2019-10-04 05:30:00 11
12 2019-10-04 06:00:00 12
13 2019-10-04 06:30:00 13
14 2019-10-04 07:00:00 14
15 2019-10-04 07:30:00 15
16 2019-10-04 08:00:00 16
17 2019-10-04 08:30:00 17
18 2019-10-04 09:00:00 18
19 2019-10-04 09:30:00 19
20 2019-10-04 10:00:00 20
21 2019-10-04 10:30:00 21
22 2019-10-04 11:00:00 22
23 2019-10-04 11:30:00 23
24 2019-10-04 12:00:00 24
25 2019-10-04 12:30:00 25
26 2019-10-04 13:00:00 26
27 2019-10-04 13:30:00 27
28 2019-10-04 14:00:00 28
29 2019-10-04 14:30:00 29
30 2019-10-04 15:00:00 30
31 2019-10-04 15:30:00 31
32 2019-10-04 16:00:00 32
33 2019-10-04 16:30:00 33
34 2019-10-04 17:00:00 34
35 2019-10-04 17:30:00 35
36 2019-10-04 18:00:00 36
37 2019-10-04 18:30:00 37
38 2019-10-04 19:00:00 38
39 2019-10-04 19:30:00 39
40 2019-10-04 20:00:00 40
41 2019-10-04 20:30:00 41
42 2019-10-04 21:00:00 42
43 2019-10-04 21:30:00 43
44 2019-10-04 22:00:00 44
45 2019-10-04 22:30:00 45
46 2019-10-04 23:00:00 46
47 2019-10-04 23:30:00 47
如果要将插槽范围从 1 到 48:
df['slots'] = df['time'].apply(lambda x: x.hour*2+1 if x.minute <= 29 else x.hour*2+2)
取决于您希望如何获得值。 检查如何将日期时间索引重采样器转换为数据帧?
df = pd.DataFrame(pd.date_range('2000-01-02', freq='15min', periods=15), columns=['time'])
df.set_index(df['time'], inplace=True)
df=df.resample('30min').interpolate()
df