我正在根据人口统计和评级条件返回业务的特征评级。
如何对"count"的int64值求和?列,同时在输出中包含变量名称,例如:Design 8
,Food 1
?
这个问题提到了转换到索引然后按索引选择。
这个问题看起来和SQL类似。
目前,我可以通过将熔体代码块分配给变量prod
来查询,然后写这样的东西。prods[prods.rating == 2]
.
示例数据:
Customer Type Age Satisfaction Design Food Wi-Fi Service Distance
Disloyal 28 Not Satisfied 0 1 2 2 13.5
Loyal 30 Satisfied 5 3 5 4 34.2
Disloyal 36 Not Satisfied 2 0 2 4 55.8
# Cols I want to see the ratings for
ranked_cols = [
"Design",
"Food",
"Wi-Fi",
"Service",
]
# Select the relevant customers
sub = df[
(df["Customer Type"] == "Disloyal")
& (df["Satisfaction"] == "Not Satisfied")
& df["Age"].between(30, 40)
]
(
sub.melt(value_vars=ranked_cols)
.groupby("variable")
.value_counts()
.to_frame()
.reset_index()
.rename(columns={"value": "rating", 0: "count"})
)
[Out]
variable rating count
0 Design 2 5
1 Food 0 1
2 Service 4 1
3 Wi-Fi 2 3
4 Design 1 3
df.groupby("variable").sum()["count"]
输出:
variable
Design 8
Food 1
Service 1
Wi-Fi 3
Name: count, dtype: int64