我有非常简单的代码,但我无法正确运行它:它的目的是连接列,删除重复,查找字典和总得分的国家,在调试器中,我可以看到下面
row["score"] += localscore
计算正确,所以这个问题是可见性,但我不知道如何修复它,有人能帮助我吗?
import pandas as pd
my_df = pd.DataFrame({'countries': ["UK,DE", "DE", "DE"],
"other_countries": ["DE", "PL", "PL"]})
scores = {
"UK": 10.0,
"DE": 20.0,
"PL": 30.0
}
my_df["joined_countries_without_duplicates"] = my_df["countries"] + "," + my_df["other_countries"]
my_df["joined_countries_without_duplicates"] = my_df["joined_countries_without_duplicates"].str.split(",")
my_df["score"] = 0
for index, row in my_df.iterrows():
row["joined_countries_without_duplicates"] = list(set(row["joined_countries_without_duplicates"]))
localscore = 0
for country in row["joined_countries_without_duplicates"]:
localscore += scores[country]
row["score"] += localscore
改变row
不会改变my_df
。你需要做一些事情,比如将row['score']
添加到列表中,然后在完成后将其分配给my_df
。
然而,这通常不是一个好方法。
你可以利用explosion, groupby和map来达到这个目的:
import pandas as pd
my_df = pd.DataFrame({'countries': ["UK,DE", "DE", "DE"],
"other_countries": ["DE", "PL", "PL"]})
scores = {
"UK": 10.0,
"DE": 20.0,
"PL": 30.0
}
my_df["joined_countries_without_duplicates"] = my_df["countries"] + "," + my_df["other_countries"]
my_df["joined_countries_without_duplicates"] = my_df["joined_countries_without_duplicates"].str.split(",").apply(set)
my_df['score'] = (my_df['joined_countries_without_duplicates'].explode()
.map(scores)
.groupby(level=0)
.sum())