为时间戳列创建箱



我正在尝试为时间戳间隔列创建一个合适的 bin,

使用诸如

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00']))

生成的 df 如下所示:

time_interval  |           bin
00:17:00        (0 days 00:10:00, 0 days 00:20:00]
01:42:00                NaN
00:15:00        (0 days 00:10:00, 0 days 00:20:00]
00:00:00                NaN
00:06:00        (0 days 00:00:00, 0 days 00:10:00]

这有点偏差,因为我想要的结果只是时间值而不是天数,而且我希望上限或最后一个箱为 60 分钟或 inf(或更多(

期望输出:

time_interval  |           bin
00:17:00        (00:10:00,00:20:00]
01:42:00        (00:60:00,inf]
00:15:00        (00:10:00,00:20:00]
00:00:00        (00:00:00,00:10:00]
00:06:00        (00:00:00,00:10:00]

感谢您的观看!

在熊猫中,时间增量inf不存在,因此使用了最大值。此外,对于包括最低值,如果希望按时间增量填充箱,则使用参数include_lowest=True

b = pd.to_timedelta(['00:00:00','00:10:00','00:20:00',
'00:30:00','00:40:00',
'00:50:00','00:60:00'])
b = b.append(pd.Index([pd.Timedelta.max]))
df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b)
print (df)
time_interval                                             Bin
0      00:17:00              (0 days 00:10:00, 0 days 00:20:00]
1      01:42:00  (0 days 01:00:00, 106751 days 23:47:16.854775]
2      00:15:00              (0 days 00:10:00, 0 days 00:20:00]
3      00:00:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
4      00:06:00     (-1 days +23:59:59.999999, 0 days 00:10:00]

如果需要字符串而不是时间增量使用zip创建带有附加'inf'的标签:

vals = ['00:00:00','00:10:00','00:20:00',
'00:30:00','00:40:00', '00:50:00','00:60:00']
b = pd.to_timedelta(vals).append(pd.Index([pd.Timedelta.max]))
vals.append('inf')
labels = ['{}-{}'.format(i, j) for i, j in zip(vals[:-1], vals[1:])] 
df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b, labels=labels)
print (df)
time_interval                Bin
0      00:17:00  00:10:00-00:20:00
1      01:42:00       00:60:00-inf
2      00:15:00  00:10:00-00:20:00
3      00:00:00  00:00:00-00:10:00
4      00:06:00  00:00:00-00:10:00

你可以用标签来解决它——

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00', '24:00:00']), labels=['(00:00:00,00:10:00]', '(00:10:00,00:20:00]', '(00:20:00,00:30:00]', '(00:30:00,00:40:00]', '(00:40:00,00:50:00]', '(00:50:00,00:60:00]', '(00:60:00,inf]'])

最新更新