映射包含区间数据的字典



假设我有以下数据帧

d = {'col1':['a','c','b','q','w','f','h','v','t','y'],
'col2':[1,3,33,7,5,8,2,3,5,0]}
df= pd.DataFrame(d)

我将第二列作为

df['col3'] = pd.cut(x=df['col2'], bins=[-0.001,2.25,4,6.5,33])

现在我想从col3中创建一个新列,这样每个括号都由字符串表示。结果应该看起来像

col1  col2            col3     col4
0    a     1  (-0.001, 2.25]  range 1
1    c     3     (2.25, 4.0]  range 2
2    b    33     (6.5, 33.0]  range 4
3    q     7     (6.5, 33.0]  range 4
4    w     5      (4.0, 6.5]  range 3
5    f     8     (6.5, 33.0]  range 4
6    h     2  (-0.001, 2.25]  range 1
7    v     3     (2.25, 4.0]  range 2
8    t     5      (4.0, 6.5]  range 3
9    y     0  (-0.001, 2.25]  range 1

我想做一个字典

di = {(-0.001, 2.25]: "range 1", (2.25, 4.0]: "range 2", (4.0, 6.5]: "range 3", (6.5, 33.0]: "range 4"}

使用和

df['col4'] = df['col3'].map(di) 

但是上面的字典是错误的,因为在字典中不能这样表示间隔。我想知道如何在区间数据上进行映射?

由于您的col3订购的categorical列,使用Series.cat.codes:

In [306]: df['col4'] = 'range ' + (df.col3.cat.codes + 1).astype(str)
In [307]: df
Out[307]: 
col1  col2            col3     col4
0    a     1  (-0.001, 2.25]  range 1
1    c     3     (2.25, 4.0]  range 2
2    b    33     (6.5, 33.0]  range 4
3    q     7     (6.5, 33.0]  range 4
4    w     5      (4.0, 6.5]  range 3
5    f     8     (6.5, 33.0]  range 4
6    h     2  (-0.001, 2.25]  range 1
7    v     3     (2.25, 4.0]  range 2
8    t     5      (4.0, 6.5]  range 3
9    y     0  (-0.001, 2.25]  range 1

试着通过labels

df['col3'] = pd.cut(x=df['col2'], bins=[-0.001,2.25,4,6.5,33],labels =['range ' + str(x)for x in range(1,5)])
df
Out[168]: 
col1  col2     col3
0    a     1  range 1
1    c     3  range 2
2    b    33  range 4
3    q     7  range 4
4    w     5  range 3
5    f     8  range 4
6    h     2  range 1
7    v     3  range 2
8    t     5  range 3
9    y     0  range 1

最新更新