我有一个pandas数据框架df1
。我有另一个pandas时间框架df2
,其中包含fruits
列,我想用df1
中name
列的值替换df1
中duplicates
列中列表中的元素。
df1
name duplicates
0 a.apple ['b.apple', 'c.apple']
1 t.orange ['arr.orange', 'pg.orange']
2 ts.grape ['a.grape' , 'test.grape']
3 u.berryCool ['X.berryCool', 'cool.berryCool']
df2
people fruits
0 jack ['b.apple', 'c.apple', 'pp.tomato', 'ao.banana' ]
1 mary ['arr.orange', 'b.apple', 'X.berryCool', 'op.mango']
2 andy ['cool.berryCool' , 'test.grape', 'yu.papaya']
3 lawrence ['jc.orange', 'c.apple']
预期输出
people fruits
0 jack ['a.apple', 'a.apple', 'pp.tomato', 'ao.banana' ]
1 mary ['t.orange', 'a.apple', 'u.berryCool', 'op.mango']
2 andy ['u.berryCool' , 'ts.grape', 'yu.papaya']
3 lawrence ['t.orange' , 'a.apple']
我怎样才能有效地完成这件事?如有任何建议,欢迎。
先将列duplicates
中的列表中的值平放,然后将值映射到dict.get
中,如果没有匹配,则返回相同的值:
d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}
df2['fruits'] = [[d.get(y,y) for y in x] for x in df2['fruits']]
print (df2)
people fruits
0 jack [a.apple, a.apple, pp.tomato, ao.banana]
1 mary [t.orange, a.apple, u.berryCool, op.mango]
2 andy [u.berryCool, ts.grape, yu.papaya]
3 lawrence [jc.orange, a.apple]
4k DataFrame的性能:(取决于数据,最好测试真实数据)
df2 = pd.concat([df2] * 1000, ignore_index=True)
In [135]: %%timeit
...: MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
...: df2['fruits1'] = (df2.explode('fruits')['fruits'].replace(MAPPING).groupby(level=0).agg(list))
...:
128 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [136]: %%timeit
...: d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}
...:
...: df2['fruits2'] = [[d.get(y,y) for y in x] for x in df2['fruits']]
...:
5.27 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
您可以创建一个映射字典(系列):
MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
df2['fruits'] = (df2.explode('fruits')['fruits'].replace(MAPPING)
.groupby(level=0).agg(list))
print(df2)
# Output
people fruits
0 jack [a.apple, a.apple, pp.tomato, ao.banana]
1 mary [t.orange, a.apple, u.berryCool, op.mango]
2 andy [u.berryCool, ts.grape, yu.papaya]
3 lawrence [jc.orange, a.apple]