获取PANDAS分类列类别,并将其作为DTYPE分配给另一列



我正在创建一个具有给定形状的空数据框如果新数据框中的列只有原始一个人的唯一值的子集?

我认为您可以使用series.cat.remove_unused_categories()方法。

这是一个小演示:

In [311]: df
Out[311]:
    channel day month             t1    title  year
631     AAA  06    01  1388967300000  title 1  2014
632     CBR  06    01  1388973300000  title 2  2014
633     CBR  06    01  1388974500000  title 3  2014
In [312]: df.channel
Out[312]:
631    AAA
632    CBR
633    CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]
In [313]: cp = df[df.channel == 'CBR'].copy()
In [314]: cp.channel
Out[314]:
632    CBR
633    CBR
Name: channel, dtype: category
Categories (2, object): [AAA, CBR]
In [315]: cp.channel.cat.categories
Out[315]: Index(['AAA', 'CBR'], dtype='object')
In [316]: cp.channel.cat.remove_unused_categories(inplace=True)
In [317]: cp.channel.cat.categories
Out[317]: Index(['CBR'], dtype='object')
In [318]: cp.channel
Out[318]:
632    CBR
633    CBR
Name: channel, dtype: category
Categories (1, object): [CBR]

更新:

In [328]: new = pd.DataFrame({'x':[1,2]})
In [329]: new['ch'] = df.loc[df.channel == 'CBR', 'channel'].values
In [330]: new
Out[330]:
   x   ch
0  1  CBR
1  2  CBR
In [331]: new.dtypes
Out[331]:
x        int64
ch    category
dtype: object
In [332]: new.ch.cat.categories
Out[332]: Index(['AAA', 'CBR'], dtype='object')

最新更新