在此处查看Pandas DataFrame。
我有一些列是字符串,还有一些列是整数/浮点。但是,数据集中的所有列当前都使用"category"数据类型进行格式化。
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29744 entries, 0 to 29743
Data columns (total 366 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ASBG01 29360 non-null category
1 ASBG03 28726 non-null category
2 ASBG04 28577 non-null category
3 ASBG05A 29130 non-null category
4 ASBG05B 29055 non-null category
5 ASBG05C 29001 non-null category
6 ASBG05D 28938 non-null category
7 ASBG05E 28938 non-null category
8 ASBG05F 29030 non-null category
9 ASBG05G 28745 non-null category
10 ASBG05H 28978 non-null category
11 ASBG05I 28971 non-null category
12 ASBG06A 28956 non-null category
13 ASBG06B 28797 non-null category
14 ASBG07 28834 non-null category
15 ASBG08 28955 non-null category
16 ASBG09A 28503 non-null category
17 ASBG09B 27778 non-null category
18 ASBG10A 29025 non-null category
19 ASBG10B 28940 non-null category
...
363 ATDMDAT 13133 non-null category
364 ATDMMEM 25385 non-null category
365 Target 29744 non-null float64
dtypes: category(365), float64(1)
memory usage: 60.5 MB
如何将所有下面有integer/float值的列转换为实际的integer/float-dtype?
谢谢。
假设以下数据帧:
import pandas as pd
import numpy as np
df = pd.DataFrame({'cat_str': ['Hello', 'World'],
'cat_int': [0, 1],
'cat_float': [3.14, 2.71]}, dtype='category')
print(df.dtypes)
# Output
cat_str category
cat_int category
cat_float category
dtype: object
你可以试试:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns
if np.issubdtype(df[col].cat.categories.dtype, np.number)}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str category
cat_int int64
cat_float float64
dtype: object
或者,如果您想删除所有类别的数据类型,请使用:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str object
cat_int int64
cat_float float64
dtype: object