我有两个表A和B,需要通过正则表达式连接:
import pandas as pd
import numpy as np
a_df = pd.DataFrame({'key': ['A','B','C','D','E','F'],
'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'],
'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2E', 'c2F']})
b_df = pd.DataFrame({'key': ['A|B|C','E|F'],
'colb1': ['cb1ABC','cb1EF'],
'colb2': ['cb2ABC','c2EF']})
生成的数据帧应该如下所示:
a_b_merge = pd.DataFrame({'key_a': ['A','B','C','D','E','F'],
'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'],
'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2E', 'c2F'],
'key_b': ['A|B|C', 'A|B|C', 'A|B|C', np.nan, 'E|F', 'E|F'],
'colb1': ['cb1ABC','cb1ABC', 'cb1ABC', np.nan, 'c1E', 'c1F'],
'colb2': ['cb2ABC','cb2ABC', 'cb2ABC', np.nan, 'c2EF', 'c2EF']})
我在SQL中的解决方案是这样的:
SELECT * FROM table_a a
LEFT JOIN table_b b
ON b.key LIKE CONCAT('%', a.key, '%')
使用熊猫merge()
有什么方法可以实现这一点吗?如果不是,哪种解决方案最优雅?
您可以创建一个包含b_df.key提取的临时列,合并两个数据帧,然后删除临时列:
a_df.merge(
b_df.assign(temporary=lambda x: x.key.str.split("|")).explode("temporary"),
left_on="key",
right_on="temporary",
how="outer",
suffixes=("_a", "_b"),
).iloc[:, :-1]
key_a cola1 cola2 key_b colb1 colb2
0 A c1A c2A A|B|C cb1ABC cb2ABC
1 B c1B c2B A|B|C cb1ABC cb2ABC
2 C c1C c2C A|B|C cb1ABC cb2ABC
3 D c1D c2D NaN NaN NaN
4 E c1E c2E E|F cb1EF c2EF
5 F c1F c2F E|F cb1EF c2EF