我有一个pandas数据帧,我希望以ID开头的值只替换为'user/ID'前缀,并删除任何前导零。我想制作第三列,在其中我只获取同一行上的ID值(没有用户前缀,没有前导零,没有IDm/ID,只有ID(和E值,并用下划线组合,然后添加"user/"前缀。我有一个例子可供参考。原始
item_id_a item_id_b
0 E00000170630 IDm00010461
1 IDm00010461 E00000170630
2 E00000353915 IDs236274573
3 IDs23627457 E00000353915
所需:
item_id_a item_id_b combined
0 E00000170630 user/ID10461 user/E00000170630_ID10461
1 user/ID10461 E00000170630 user/ID10461_E00000170630
2 E00000353915 user/ID236274573 user/E00000353915_ID236274573
3 user/ID23627457 E00000353915 user/ID23627457_E00000353915
这应该有效:
(df.replace(r'ID[a-z]?0*','ID',regex=True)
.assign(combined = lambda x: 'user/' + x['item_id_a'] + '_' + x['item_id_b'])
.replace(r'^ID','user/ID',regex=True))
输出:
item_id_a item_id_b combined
0 E00000170630 user/ID10461 user/E00000170630_ID10461
1 user/ID10461 E00000170630 user/ID10461_E00000170630
2 E00000353915 user/ID236274573 user/E00000353915_ID236274573
3 user/ID23627457 E00000353915 user/ID23627457_E00000353915
df["combined"] = "user/" + df.item_id_a + "_" + df.item_id_b
df.loc[1::2, "item_id_a"] = "user/" + df.loc[1::2, "item_id_a"]
df.loc[0::2, "item_id_b"] = "user/" + df.loc[0::2, "item_id_b"]