假设我有以下数据帧
d = {'col1':['a','c','b','q','w','f','h','v','t','y'],
'col2':[1,3,33,7,5,8,2,3,5,0]}
df= pd.DataFrame(d)
我将第二列作为
df['col3'] = pd.cut(x=df['col2'], bins=[-0.001,2.25,4,6.5,33])
现在我想从col3
中创建一个新列,这样每个括号都由字符串表示。结果应该看起来像
col1 col2 col3 col4
0 a 1 (-0.001, 2.25] range 1
1 c 3 (2.25, 4.0] range 2
2 b 33 (6.5, 33.0] range 4
3 q 7 (6.5, 33.0] range 4
4 w 5 (4.0, 6.5] range 3
5 f 8 (6.5, 33.0] range 4
6 h 2 (-0.001, 2.25] range 1
7 v 3 (2.25, 4.0] range 2
8 t 5 (4.0, 6.5] range 3
9 y 0 (-0.001, 2.25] range 1
我想做一个字典
di = {(-0.001, 2.25]: "range 1", (2.25, 4.0]: "range 2", (4.0, 6.5]: "range 3", (6.5, 33.0]: "range 4"}
使用和
df['col4'] = df['col3'].map(di)
但是上面的字典是错误的,因为在字典中不能这样表示间隔。我想知道如何在区间数据上进行映射?
由于您的col3
是订购的categorical
列,使用Series.cat.codes
:
In [306]: df['col4'] = 'range ' + (df.col3.cat.codes + 1).astype(str)
In [307]: df
Out[307]:
col1 col2 col3 col4
0 a 1 (-0.001, 2.25] range 1
1 c 3 (2.25, 4.0] range 2
2 b 33 (6.5, 33.0] range 4
3 q 7 (6.5, 33.0] range 4
4 w 5 (4.0, 6.5] range 3
5 f 8 (6.5, 33.0] range 4
6 h 2 (-0.001, 2.25] range 1
7 v 3 (2.25, 4.0] range 2
8 t 5 (4.0, 6.5] range 3
9 y 0 (-0.001, 2.25] range 1
试着通过labels
df['col3'] = pd.cut(x=df['col2'], bins=[-0.001,2.25,4,6.5,33],labels =['range ' + str(x)for x in range(1,5)])
df
Out[168]:
col1 col2 col3
0 a 1 range 1
1 c 3 range 2
2 b 33 range 4
3 q 7 range 4
4 w 5 range 3
5 f 8 range 4
6 h 2 range 1
7 v 3 range 2
8 t 5 range 3
9 y 0 range 1