我试图用一组值替换数据框中的多个分类变量。
我尝试了以下代码:
data['Gender'] = data['Gender'].replace(to_replace={"male","M","m","female","f","F"}, value={"Male","Male","Male","Female", "Female", "Female"}).
我希望每个m、m或male都被替换为male。女性也一样。
I got error:
ValueError:替换列表的长度必须匹配。期望6得到2
您的代码的问题是您使用set
s作为replace()
方法的参数。对于to_replace
,基数可能很好,因为所有元素都是唯一的。对于value
,您定义的set
实际上是{"Male", "Female"}
,这与to_replace
的基数不匹配。即使基数匹配,set
也不能保证顺序,因此它不是适合手头工作的数据结构。相反,如果您使用list
s或tuple
s,这就可以了:
data['Gender'] = data['Gender'].replace(to_replace=("male","M","m","female","f","F"), value=("Male","Male","Male","Female", "Female", "Female")).
虽然使用dict
可能会使代码更容易阅读,因为替换的代码写在一起:
data["Gender"] = data["Gender"].replace({"m" : "Male", "M" : "Male", "male": "Male", "f": "Female", "F": "Female", "female": "Female"})
有一种方法。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Gender': ['m', 'M', 'f', 'F', 'm']})
print(df)
Gender
0 m
1 M
2 f
3 F
4 m
replace_values = {'m' : 'Male', 'M' : 'Male', 'f':'Female','F':'Female'}
df = df.replace({"Gender": replace_values})
df
Gender
0 Male
1 Male
2 Female
3 Female
4 Male