无法使用 DataFrame.eval() 减去 datetime64

给定一个带有几个时间戳的数据帧：

In [88]: df.dtypes
Out[88]:
Time             datetime64[ns]
uniqstime        datetime64[ns]
dtype: object

如果我调用eval()，我得到一个类型错误：

In [91]: df.eval('since = Time - uniqstime')
...
ValueError: unkown type timedelta64[ns]

（顺便说一下，错误消息中的"未知"拼写错误。

但我可以使用 Python 符号：

In [92]: df['since'] = df.Time - df.uniqstime

在 numexpr 中分配timedelta有问题吗？

这

已经在github上存在问题（尽管已关闭），请参阅此处：https://github.com/pydata/pandas/issues/5007

目前不支持。但是，ATM没有真正的优势，因为这些计算无论如何都是在python空间中完成的。

除非你只是想让你的代码更短、更易读（一个值得称赞的目标），否则numexpr必须支持timedelta64操作才能获得性能优势。正如@Jeff所说，这些（和datetime64操作）是在Python空间中评估的，因为numexpr不支持pandas NaT（N a - T ime）。但是，非timedelta64操作是使用numexpr进行评估的，因此您可能需要一个非常大的timedelta64数组才能产生瓶颈。

从熊猫0.23开始，你可以通过将engine参数设置为 python 来做到这一点，例如：

df.eval('since = Time - uniqstime', engine='python')

来自pandas.eval的熊猫文献：

engine : string or None, default 'numexpr', {'python', 'numexpr'}
    The engine used to evaluate the expression. Supported engines are
    - None         : tries to use ``numexpr``, falls back to ``python``
    - ``'numexpr'``: This default engine evaluates pandas objects using
                     numexpr for large speed ups in complex expressions
                     with large frames.
    - ``'python'``: Performs operations as if you had ``eval``'d in top
                    level python. This engine is generally not that useful.
    More backends may be available in the future.

我不同意它"没那么有用"的说法。在我看来，它可以缩短执行某些操作所需的代码，有时可能会派上用场。

相关内容

最新更新

热门标签：