和value_counts()返回的groupby列值



我正在根据人口统计和评级条件返回业务的特征评级。

如何对"count"的int64值求和?列,同时在输出中包含变量名称,例如:Design 8,Food 1?

这个问题提到了转换到索引然后按索引选择。
这个问题看起来和SQL类似。

目前,我可以通过将熔体代码块分配给变量prod来查询,然后写这样的东西。prods[prods.rating == 2].

示例数据:

Customer Type    Age    Satisfaction    Design   Food    Wi-Fi    Service    Distance
Disloyal     28   Not Satisfied         0      1        2          2        13.5
Loyal     30       Satisfied         5      3        5          4        34.2
Disloyal     36   Not Satisfied         2      0        2          4        55.8

# Cols I want to see the ratings for
ranked_cols = [
"Design",
"Food",
"Wi-Fi",
"Service",
]
# Select the relevant customers
sub = df[
(df["Customer Type"] == "Disloyal")
& (df["Satisfaction"] == "Not Satisfied")
& df["Age"].between(30, 40)
]
(
sub.melt(value_vars=ranked_cols)
.groupby("variable")
.value_counts()
.to_frame()
.reset_index()
.rename(columns={"value": "rating", 0: "count"})
)
[Out]
variable  rating  count
0   Design    2       5
1   Food      0       1 
2   Service   4       1
3   Wi-Fi     2       3
4   Design    1       3

df.groupby("variable").sum()["count"]

输出:

variable
Design     8
Food       1
Service    1
Wi-Fi      3
Name: count, dtype: int64

最新更新