熊猫在read_csv上弄乱了一个高分辨率整数

>编辑：这是Excel更改数据类型的错误，而不是Pandas。

当我使用 pd.read_csv(file) 读取 CSV 时，一列超长整数被转换为低分辨率浮点数。这些整数是以微秒为单位的日期时间。

例：某些值的 CSV 列：

15555071095204000
15555071695202000
15555072295218000
15555072895216000
15555073495207000
15555074095206000
15555074695212000
15555075295202000
15555075895210000
15555076495216000
15555077095230000
15555077695206000
15555078295212000
15555078895218000
15555079495209000
15555080095208000
15555080530515000
15555086531880000
15555092531889000
15555098531886000
15555104531886000
15555110531890000
15555116531876000
15555122531873000
15555128531884000
15555134531884000
15555140531887000
15555146531874000

pd.read_csv生产：1.55551e+16

如何让它报告确切的 INT？

我试过使用：float_precision='high'

这可能是由 Pandas 处理缺失值的方式引起的，这意味着您的列作为浮点数导入，以允许将缺失值编码为 NaN .

一个简单的解决方案是强制列作为str导入，然后插补或删除缺失值，并转换为int：

import pandas as pd
df = pd.read_csv(file, dtypes={'col1': str}) # Edit to use appropriate column reference
# If you want to just remove rows with missing values, something like:
df = df[df.col1 != '']
# Then convert to integer
df.col1 = df.col1.astype('int64')

通过最小，完整和可验证的示例，我们可以查明问题并更新代码以准确解决问题。

相关内容

最新更新

热门标签：