将带有%符号的分类变量转换为数值变量Python Pandas

dt = {'tensile_strength': ['15%', '15%', '20%', '20%', '25%', '25%', '30%', '30%'], 
'cotton_pct': [7, 7, 12, 17, 14, 18, 19, 25]}
mydt = pd.DataFrame(dt, columns = ['tensile_strength', 'cotton_pct'])

在我上面的数据集中，' cotton_pct '是一个分类变量。对于' cotton_pct '，我如何创建一个新的变量，它是cotton_pct的数字表示形式?

您可以通过.str访问整个列，之后您可以将.replace()应用于该列的所有元素。转化为'int'，再存回df

mydt['tensile_strength'] = mydt['tensile_strength'].str.replace("%", '').astype('int')

您可以使用:

mydt['new_col'] = pd.to_numeric(mydt['tensile_strength'].str.strip('%'))

NB。在这里使用一个新列，但是你当然可以覆盖tensile_strength

输出:

tensile_strength  cotton_pct  new_col
0              15%           7       15
1              15%           7       15
2              20%          12       20
3              20%          17       20
4              25%          14       25
5              25%          18       25
6              30%          19       30
7              30%          25       30

相关内容

最新更新

热门标签：