如何获得熊猫日期时间序列的最小值和另一个日期时间序列的(常数)最大值?

假设我有两个日期时间序列:

foo = pd.to_datetime(pd.Series([
'2020-01-01 12:00:00',
'2020-02-02 23:12:00'
]))
bar = pd.to_datetime(pd.Series([
'2020-01-20 01:02:03',
'2020-01-30 03:02:01'
]))

都是datetime64[ns]:

>>> foo
0   2020-01-01 12:00:00
1   2020-02-02 23:12:00
dtype: datetime64[ns]

>>> bar
0   2020-01-20 01:02:03
1   2020-01-30 03:02:01
dtype: datetime64[ns]

对于foo中的每个元素，我想要得到

的最小值

foo
bar的(恒定)最大值

但是这会产生一个TypeError:

>>> np.minimum(foo, bar.max())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: '<=' not supported between instances of 'int' and 'Timestamp'

如果我只做Series自己:

>>> np.minimum(foo, bar)
0   2020-01-01 12:00:00
1   2020-01-30 03:02:01
dtype: datetime64[ns]

bar.max()由于某种原因返回Timestamp，而不是datetime64，但即使使用显式pythondatetime对象也不起作用。为什么numpy认为foo是int?有办法解决这个问题吗?

使用pandas.Series.where:

foo.where(foo < bar.max(), bar.max())

如果条件(foo < bar.max())为False)将foo的值替换为bar.max()。

>>> barmax = bar.max()
>>> barmax
Timestamp('2020-01-30 03:02:01')
>>> foo.map(lambda x: np.minimum(x, barmax))
0   2020-01-01 12:00:00
1   2020-01-30 03:02:01
dtype: datetime64[ns]
>>>

相关内容

最新更新

热门标签：