需要过滤一些字符串元素，但我得到类型错误：|不支持的操作数类型："str"和"str"

所以我使用pandas来过滤csv，我需要过滤一列中的三个不同字符串元素，但当我使用或(|(时，我会犯这个错误。还有什么其他方法可以过滤许多字符串，而不必命名不同的变量，就像每个过滤器一样？这是代码：

# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
bdegree = df[(df["education"] == "Bachelors") & (df["salary"] >= "50K")].count()
mdegree = df[(df["education"] == "Masters") & (df["salary"] >= "50K")].count()
phddegree = df[(df["education"] == "Doctorate") & (df["salary"] >= "50K")].count()
all_degrees = bdegree + mdegree + phddegree
print(all_degrees)
percentaje_of_more50 = (all_degrees / df["education" == "Bachelors"|"Masters"|"Doctorate"].count())*100
print("The percentaje of people with bla bla bla is", percentaje_of_more50["education"].round(1))

顺便说一句，我正在处理这段代码的逻辑错误，所以忽略它：(。

==查找完全匹配，因为没有人的"；教育；包含字符串"Bachelors"|"Masters"|"Doctorate"，它将返回所有False的序列。您可以使用isin来代替：

msk = df["education"].isin(["Bachelors","Masters","Doctorate"])

上面的函数将返回一个布尔序列，因此在上面使用.count方法只会显示它的长度，这可能不是您想要的。因此，您需要使用它来过滤相关行：

df[msk].count()

然后可以将percentage_of_more50写为：

percentage_of_more50 = (all_degrees / df[msk].count())*100

请注意，您也可以使用isin导出all_degrees：

all_degrees = df[df["education"].isin(["Bachelors","Masters","Doctorate"]) & (df['salary']>='50K')].count()

此外，只有当所有工资都低于"99k"时，df["salary"] >= "50K"才能按您的意愿工作，否则您最终会得到错误的输出，因为如果您选中"100k" > "50k"，它会显示False，即使它是True。解决这个问题的一种方法是填充"；工资；列数据；0"；s，直到使用str.zfill的每个条目都有一定数量的字符长，如：

df['salary'] = df['salary'].str.zfill(5)

然后每个条目变为5个字符长。例如，

s = pd.Series(['100k','50k']).str.zfill(5)

变为：

0    0100k
1    0050k
dtype: object

然后你可以进行正确的比较。

相关内容

最新更新

热门标签：