替换数据框架中的多个字符串



我试图用一组值替换数据框中的多个分类变量。

我尝试了以下代码:

data['Gender'] = data['Gender'].replace(to_replace={"male","M","m","female","f","F"}, value={"Male","Male","Male","Female", "Female", "Female"}).

我希望每个m、m或male都被替换为male。女性也一样。

I got error:

ValueError:替换列表的长度必须匹配。期望6得到2

您的代码的问题是您使用sets作为replace()方法的参数。对于to_replace,基数可能很好,因为所有元素都是唯一的。对于value,您定义的set实际上是{"Male", "Female"},这与to_replace的基数不匹配。即使基数匹配,set也不能保证顺序,因此它不是适合手头工作的数据结构。相反,如果您使用lists或tuples,这就可以了:

data['Gender'] = data['Gender'].replace(to_replace=("male","M","m","female","f","F"), value=("Male","Male","Male","Female", "Female", "Female")).

虽然使用dict可能会使代码更容易阅读,因为替换的代码写在一起:

data["Gender"] = data["Gender"].replace({"m" : "Male", "M" : "Male", "male": "Male", "f": "Female", "F": "Female", "female": "Female"})

有一种方法。

import pandas as pd
import numpy as np
df = pd.DataFrame({'Gender': ['m', 'M', 'f', 'F', 'm']})
print(df)

Gender
0      m
1      M
2      f
3      F
4      m
replace_values = {'m' : 'Male', 'M' : 'Male', 'f':'Female','F':'Female'}                                                                                          
df = df.replace({"Gender": replace_values}) 
df
Gender
0    Male
1    Male
2  Female
3  Female
4    Male

相关内容

  • 没有找到相关文章

最新更新