为什么我的源熊猫数据帧未在以下函数中更新?



我有以下函数(这是一个最小的工作示例,模仿我在完整代码中实际做的事情):

import pandas as pd
import numpy as np
# hardoded data for reproducibility
df = pd.DataFrame(
[
["SiteA", "Long_Key_With_KeyWord", np.nan],
["SiteA", "Long_Key_Without", np.nan],
["SiteB", "Long_Key_With_KeyWord", np.nan],
],
columns=["site", "tags", "to_fill"],
)
library = {"SiteA": {"KeyWord": "NewKeyWord"}}
# logic
df_part = df.loc[df.to_fill.isna(), :]
groupby_site = df_part.groupby("site")
for site in groupby_site.groups.keys():
site_data = groupby_site.get_group(site)
try:
library_site_data = library[site]
for idx, row in site_data.iterrows():
mask = [key in row["tags"] for key in library_site_data.keys()]
match = [key for key, mask in zip(library_site_data.keys(), mask) if mask]
if match:
value = library_site_data[match[0]]
df_part.loc[idx, "to_fill"] = value
else:
print(f"Too bad")
except KeyError:
print(f"no data for site {site} in library")
next
print(
f"Total unfound mapping tags {df.to_fill.isna().sum()}"
)  # why isn't the df being filled in ?

我不明白的是为什么df没有被填写,而我认为df_partdfdf_part被填写的参考,它也应该填写df

我得到这个:

print(df)
site                   tags  to_fill
0  SiteA  Long_Key_With_KeyWord      NaN
1  SiteA       Long_Key_Without      NaN
2  SiteB  Long_Key_With_KeyWord      NaN

and I want this

site                   tags  to_fill
0  SiteA  Long_Key_With_KeyWord      NewKeyWord
1  SiteA       Long_Key_Without      NaN
2  SiteB  Long_Key_With_KeyWord      NaN

What am I missing ?

我认为问题是双索引(loc),可能会创建一个副本,而不是返回一个视图。这个问题的第一个答案进一步解释了它。

最新更新