我有两个表:
d = {'ID': ['A', 'B', 'C'], 'Month': [1,3,5],'group':['x','x','x'}
df1 = pd.DataFrame(data=d)
d2 = {'Month': [1, 2,3,4,5], 'value': [0.8, 0.2,0.5,0.3,0.7],'group':['x','x','x']}
df2 = pd.DataFrame(data=d2)
我想要加入一个组,这在我的真实表中是不一样的,在df2的行中,df1中month是<= month, SQL等效的是join on df1.month<=df2。月输出:
<表类>ID 月 价值 tbody><<tr>1 0.8 B1 0.8 B2 0.2 B3 0.5 C1 0.8 C2 0.2 C3 0.5 C4 0.3 C5 0.7 表类>
这并不简单,但假设您在df1
中每个ID都有唯一的行,您可以在每个组和concat
中执行一个merge_asof
:
out = pd.concat([pd.merge_asof(df2, g, by='group', on='Month',
direction='forward').dropna()
for _, g in df1.groupby('ID')], ignore_index=True)
输出:
Month value group ID
0 1 0.8 x A
1 1 0.8 x B
2 2 0.2 x B
3 3 0.5 x B
4 1 0.8 x C
5 2 0.2 x C
6 3 0.5 x C
7 4 0.3 x C
8 5 0.7 x C
或者,使用janitor
的conditional_join
:
# pip install janitor
import janitor
out = df1.conditional_join(
df2,
('Month', 'Month', '>='),
('group', 'group', '=='),
df_columns=['ID']
)
# or
out = df2.conditional_join(
df1,
('Month', 'Month', '<='),
('group', 'group', '=='),
right_columns=['ID'],
).sort_values(by=['ID', 'Month'],
ignore_index=True)
输出:
Month value group ID
0 1 0.8 x A
1 1 0.8 x B
2 2 0.2 x B
3 3 0.5 x B
4 1 0.8 x C
5 2 0.2 x C
6 3 0.5 x C
7 4 0.3 x C
8 5 0.7 x C