从下面的代码中,我希望这些时间戳被装入通过IntervalIndex
提供的周期中。不幸的是,我只收到了返回的NaN
。拜托,怎么了?
import pandas as pd
# Test data
ts = [pd.Timestamp('2022/03/01 09:00'),
pd.Timestamp('2022/03/01 10:00'),
pd.Timestamp('2022/03/01 10:30'),
pd.Timestamp('2022/03/01 15:00')]
df = pd.DataFrame({'a':range(len(ts)), 'ts': ts})
# Test
bins = pd.interval_range(pd.Timestamp('2022/03/01 08:00'),
pd.Timestamp('2022/03/01 16:00'),
freq='2H',
closed="left")
row_labels = pd.cut(df["ts"], bins)
我预计结果是:
[2022-03-01 08:00:00, 2022-03-01 10:00:00)
[2022-03-01 10:00:00, 2022-03-01 12:00:00)
[2022-03-01 10:00:00, 2022-03-01 12:00:00)
[2022-03-01 14:00:00, 2022-03-01 16:00:00)
但我只得到NaN
。
row_labels
Out[37]:
0 NaN
1 NaN
2 NaN
3 NaN
Name: ts, dtype: category
Categories (4, interval[datetime64[ns], left]): [ <
[2022-03-01 08:00:00, 2022-03-01 10:00:00) <
[2022-03-01 10:00:00, 2022-03-01 12:00:00) <
[2022-03-01 12:00:00, 2022-03-01 14:00:00) <
[2022-03-01 14:00:00, 2022-03-01 16:00:00)]
拜托,怎么了?谢谢你的帮助。最佳,
非常有趣
pd.cut(df['ts'].to_list(), bins)
产生预期结果
[[2022-03-01 08:00:00, 2022-03-01 10:00:00),
[2022-03-01 10:00:00, 2022-03-01 12:00:00),
[2022-03-01 10:00:00, 2022-03-01 12:00:00),
[2022-03-01 14:00:00, 2022-03-01 16:00:00)]
Categories (4, interval[datetime64[ns], left]): [
[2022-03-01 08:00:00, 2022-03-01 10:00:00) <
[2022-03-01 10:00:00, 2022-03-01 12:00:00) <
[2022-03-01 12:00:00, 2022-03-01 14:00:00) <
[2022-03-01 14:00:00, 2022-03-01 16:00:00)]
但是
pd.cut(df['ts'].to_numpy(), bins)
[NaN, NaN, NaN, NaN]
Categories (4, interval[datetime64[ns], left]): [
[2022-03-01 08:00:00, 2022-03-01 10:00:00) <
[2022-03-01 10:00:00, 2022-03-01 12:00:00) <
[2022-03-01 12:00:00, 2022-03-01 14:00:00) <
[2022-03-01 14:00:00, 2022-03-01 16:00:00)]
什么
为什么它适用于列表,但不适用于np.ndarray
或pd.Series
?
另外:
bins_dt_index = pd.date_range(pd.Timestamp('2022/03/01 08:00'),
pd.Timestamp('2022/03/01 16:00'),
freq='2H')
bins_dt_index
DatetimeIndex(['2022-03-01 08:00:00', '2022-03-01 10:00:00',
'2022-03-01 12:00:00', '2022-03-01 14:00:00',
'2022-03-01 16:00:00'],
dtype='datetime64[ns]', freq='2H')
pd.cut(df['ts'].to_list(), bins_dt_index, right=False)
产生
TypeError: '<' not supported between instances of 'int' and 'Timestamp'
同时
pd.cut(df['ts'], bins_dt_index, right=False)
产生预期的结果!
0 [2022-03-01 08:00:00, 2022-03-01 10:00:00)
1 [2022-03-01 10:00:00, 2022-03-01 12:00:00)
2 [2022-03-01 10:00:00, 2022-03-01 12:00:00)
3 [2022-03-01 14:00:00, 2022-03-01 16:00:00)
Name: ts, dtype: category
Categories (4, interval[datetime64[ns], left]): [
[2022-03-01 08:00:00, 2022-03-01 10:00:00) <
[2022-03-01 10:00:00, 2022-03-01 12:00:00) <
[2022-03-01 12:00:00, 2022-03-01 14:00:00) <
[2022-03-01 14:00:00, 2022-03-01 16:00:00)]
所以DatetimeIndex
可以与np.ndarray
和pd.Series
一起使用,但不能与列表一起使用!
IntervalIndex
,反之亦然!
它们不应该都一样工作吗?我的意思是,pd.cut
清楚地表明,x
可以是一维的array-like
。
如果有人解释为什么会发生这种情况,那就太好了