将pandas列的数据类型设置为与现有类别列相同

我有一个带有一个category列的DataFrame。我添加了一个新列，并希望它具有相同的category数据类型。

这是初始数据

我添加了新的类别列，并希望为其复制A列的dtype

df = pd.DataFrame(data)
df.A = df.A.astype('category')

第一次看起来还可以。

print(df.C)
0    NaN
1    NaN
2    NaN
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']

但当我添加值时…

df.C = 'A'
print(df.C)
0    A
1    A
2    A
Name: C, dtype: object

这是完整的MWE。

#!/usr/bin/env python3
import pandas as pd
data = {'A': ['A', 'B', 'C'],
'B': range(3)}
df = pd.DataFrame(data)
df.A = df.A.astype('category')
print(df)
# New empty(!) column
df['C'] = pd.NA
df.C = df.C.astype(df.A.dtype)
# OK, the categories are there
print(df.C)
# set one value (from the category)
df.C = 'A'
# the category type is gone
print(df.C)

顺便说一句：在实际数据中，我在两个不同DataFrames的两列之间复制数据类型。但我不认为这件事适合这个问题。

如果使用其中一个选项。。。

# set C first option
df.C = pd.Series(['A'] * len(df.C)).astype(df.A.dtype)
# set C second option
df.C = df.C.fillna("A")
# set C third option, probably most intuitive
df.C[:] = "A"

所有解决方案都为print(df.C)提供以下输出：

0    A
1    A
2    A
Name: C, dtype: category
Categories (3, object): ['A', 'B', 'C']

相关内容

最新更新

热门标签：