有没有一种更有效的方法可以在Python Pandas中使用to_numeric()查找和降级int64列



tl;dr:需要帮助清理下面的downstast_int(df(函数

你好,我正在尝试编写自己的下转换函数来节省内存使用。我很好奇我的代码(坦率地说,相当混乱,但功能正常(的替代方案,以使它更可读,也许更快。

下转换函数直接修改我的数据帧,我不确定我应该这么做。

感谢您的帮助。

示例df

df = pd.DataFrame({
'first': [1_000, 200_000],
'second': [-30, -40_000],
'third': ["some", "string"],
'fourth': [4.5, 6.1],
'fifth': [-6, -8]
})
第一第二第三第四第五0 1000-30约4.5-61 200000-40000字符串6.1-8

df.info((

#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
0   first   2 non-null      int64  
1   second  2 non-null      int64  
2   third   2 non-null      object 
3   fourth  2 non-null      float64
4   fifth   2 non-null      int64  
dtypes: float64(1), int64(3), object(1)

下行功能

def downcast_int(df):
"""Select all int columns. Convert them to unsigned or signed types."""
cols = df.select_dtypes(include=['int64']).columns
cols_unsigned = None

# There is at least one negative number in a column.
if (df[cols] < 0).any().any():
df_unsigned = (df[cols] < 0).any()
cols_unsigned = df_unsigned[df_unsigned == True].index
df[cols_unsigned] = df[cols_unsigned].apply(pd.to_numeric, downcast='signed')

# If there were any changed columns, remove them.
if cols_unsigned is not None:
cols = cols.drop(cols_unsigned)

# Turn the remaining columns into unsigned integers.
df[cols] = df[cols].apply(pd.to_numeric, downcast='unsigned')

下广播后的df.info((

#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
0   first   2 non-null      uint32 
1   second  2 non-null      int32  
2   third   2 non-null      object 
3   fourth  2 non-null      float64
4   fifth   2 non-null      int8   
dtypes: float64(1), int32(1), int8(1), object(1), uint32(1)

只需应用to_numeric()两次。一次达到min-signed,然后第二次减少unsigned。

df2 = df.select_dtypes(include=[np.number]).apply(pd.to_numeric, downcast='signed')
df2 = df2.select_dtypes(include=[np.number]).apply(pd.to_numeric, downcast='unsigned')
df[df2.columns] = df2

与您的方法相同的输出:

#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
0   first   2 non-null      uint32 
1   second  2 non-null      int32  
2   third   2 non-null      object 
3   fourth  2 non-null      float64
4   fifth   2 non-null      int8   
dtypes: float64(1), int32(1), int8(1), object(1), uint32(1)

相关内容

最新更新