在我通过SQL创建的单页Excel文件中,我有3列表示字母评级。评级值在评级1、2和3之间可能不同,但它们仍然可以用相同的值进行排名。
我正试图在我的Excel文件中创建一个新列,可以接受这3个字母的评级,并提取中间评级。
ranking | Rating_1 | Rating_2 | Rating_3 | NEW_COLUMN |
(1 lowest) | -------- | -------- | -------- | -------- |
3 | A+ | AA | Aa | middle(rating)|
2 | B+ | BB | Bb | middle(rating)|
1 | Fa | Fb | Fc | middle(rating)|
| -------- | -------- | -------- | --------- |
我需要考虑三种情况:
- 如果所有三个评级都不同,请在评级_1、评级_2和评级_3之间选择不是最高评级或最低评级的评级
- 如果三个评级都相同,请在rating_1上选择评级
- 如果两个评级相同,但一个不同,请选择最小评级
我创建了一个数据帧:
df = pd.DataFrame(
{"Rating_1": ["A+", "AA", "Aa"],
"Rating_2": ["B+", "BB", "Bb"],
"Rating_3": ["Fa", "Fb", "Fc"]}
)
df["NEW COLUMN"] = {insert logic here}
还是更容易创建一个新的DF来过滤原始DF?
使用流式玩具数据帧:
import pandas as pd
df = pd.DataFrame(
{
"Rating_1": ["A+", "Cc", "Aa"],
"Rating_2": ["AA", "Cc", "Aa"],
"Rating_3": ["BB", "Cc", "Bb"],
}
)
print(df)
# Output
Rating_1 Rating_2 Rating_3
0 A+ AA BB
1 Cc Cc Cc
2 Aa Aa Bb
以下是使用Python集检查条件的一种方法:
# First condition
df["Middle_rating"] = df.apply(
lambda x: sorted([x["Rating_1"], x["Rating_2"], x["Rating_3"]])[1]
if len(set([x["Rating_1"], x["Rating_2"], x["Rating_3"]])) == 3
else "",
axis=1,
)
# Second condition
df["Middle_rating"] = df.apply(
lambda x: x["Rating_1"]
if len(set([x["Rating_1"], x["Rating_2"], x["Rating_3"]])) == 1
else x["Middle_rating"],
axis=1,
)
# Third condition
ratings = {
rating: i
for i, rating in enumerate(["A+", "AA", "Aa", "B+", "BB", "Bb", "C+", "CC", "Cc"])
} # ratings ordered from best (A+: 0) to worst (CC: 8)
df["Middle_rating"] = df.apply(
lambda x: max(x["Rating_1"], x["Rating_2"], x["Rating_3"])
if len(
set([ratings[x["Rating_1"]], ratings[x["Rating_2"]], ratings[x["Rating_3"]]])
)
== 2
else x["Middle_rating"],
axis=1,
)
然后:
print(df)
# Output
Rating_1 Rating_2 Rating_3 Middle_rating
0 A+ AA BB AA
1 Cc Cc Cc Cc
2 Aa Aa Bb Bb