将DataFrame的值替换为字典的值

我有两个dataframe数千行。列都具有dtype字符串。代码片段如下所示:

DF1:

ID   SUCCESSOR 
0       0001234     3620031
1       0001235     6640002
2       0002456     8620003
3       0013456     8640004
4       1711999     1283456 <- see DF2
...         ...         ... 
409813  9162467        <NA>
409814  9212466        <NA>
409815  9312466     6975A0C
409816  9452463        <NA>
409817  9591227        <NA>

DF2:

ID
2       1111682
3       1123704
14      1567828
15      1711999 <- that value should be replaced with '1283456'
16      1711834
...         ...
845775  970879B
845776  975879B
845777  9275A0A
845778  9285A05
845779  9295A05

不要惊讶第二个DataFrame缺少一些索引，因为我之前过滤了它们，因为它们是不相关的。此外，nan也不相关，因为我的算法绕过了它们。

我现在想用第一个DataFrame中具有相同ID的后继数据替换第二个DataFrame中的ID。

输出应该是:

ID
2       1111682
3       1123704
14      1567828
15      1283456 <- now replaced
16      1711834
...         ...
845775  970879B
845776  975879B
845777  9275A0A
845778  9285A05
845779  9295A05

为了不破坏示例，我只替换了一个值。在现实中有几种替代

两种方法:

在我的第一种方法中，我迭代了DF1并使用了replace()函数，但是这种方法需要数年时间，所以它是无用的。

在第二种方法中，我首先将DF1转换为字典，然后应用map()函数。我按照JohnE在这里描述的那样做:用字典重新映射pandas列中的值在一个小例子中，它的效果非常好:

df = pd.DataFrame({'col1': {1: 1, 2: 2, 3: 4, 4: 1}, 'col2': {1: 2, 2: np.nan}})
di = {1: "A", 2: "B"}
col1  col2
1     1   2.0
2     2   NaN
3     4   NaN
4     1   NaN
df['col1'].map(di).fillna(df['col1'])
1    A
2    B
3    4
4    A

我映射DF1和DF2的函数是这样的:

def mapping(df1, df2):   
di =dict(zip(df1.ID, df1.SUCCESSOR)) # create the dict
changes = 1   
while(changes > 0):        

changes = 0
df_old = df2                    
print(df2) #check how df2 looks before mapping.
df2['ID'] = df2['ID'].map(di).fillna(df2['ID'])                   
print(df2) # check how df2 looks after mapping. Unfortunately no changes :( so the error must be in the mapping function one line above here.
if df_old.equals(df2) == False:
changes = 1    

return df2

显然错误一定在这行

df2['ID'] = df2['ID'].map(dic).fillna(df2['ID']).

然而，我就是不明白为什么这不起作用。这里什么不起作用，为什么?

如果有人能帮助我，我永远感激他们!最诚挚的问候,阿方索

编辑:编辑:我发现了错误，我是个白痴。我的解决方案起作用了，但是这一行:"df_old = df2"阻止循环继续。无论如何，非常感谢，如果我占用了时间，很抱歉!

下面是一个在线程序，它通过过滤数据帧来创建替换字典:

df2['ID'] = df2['ID'].replace(dict(zip(df2[df2['ID'].isin(df1['ID'])].sort_values(by=['ID']).reset_index()['ID'], df1.loc[df1['ID'].isin(df2['ID'])].sort_values(by=['ID']).reset_index()['SUCCESSOR'])))

相关内容

最新更新

热门标签：