我有这些DFs
df1
user_id code name code_equivalence name_equivalence
51 123 bi lovers 542 bi for marketing
51 123 bi lovers 545 i love bi
51 234 datascience 345 data and science
51 234 datascience 555 data lovers
51 255 antiquity history 429 roma
51 255 antiquity history 430 greece
52 123 bi lovers 542 bi for marketing
52 123 bi lovers 545 i love bi
52 256 modern history 500 france
52 256 modern history 501 germany
52 200 arts 400 arts I
52 200 arts 401 arts II
df2
user_id code name status
51 123 bi lovers ongoing
51 430 greece ongoing
52 501 germany ongoing
52 050 numbers ongoing
我想通过检查df2代码是否与df1代码或df1 code_equivalence相同来合并它们,df2名称与df1名称或df1 name_equivalence相同,以获得df2状态。这样的:
合并df
user_id code name code_equivalence name_equivalence status
51 123 bi lovers 542 bi for marketing ongoing
51 123 bi lovers 545 i love bi ongoing
51 234 datascience 345 data and science (null)
51 234 datascience 555 data lovers (null)
51 255 antiquity history 429 roma (null)
51 255 antiquity history 430 greece ongoing
52 123 bi lovers 542 bi for marketing (null)
52 123 bi lovers 545 i love bi (null)
52 256 modern history 500 france (null)
52 256 modern history 501 germany ongoing
52 200 arts 400 arts I (null)
52 200 arts 401 arts II (null)
之后,我想把数据转换成一个新的df,看起来像这样:
最终df
user_id code name code_equivalence name_equivalence status
51 123 bi lovers [542, 545] [bi for marketing, i love bi] ongoing
51 234 datascience [345, 555] [data and science, data lovers] (null)
51 255 antiquity history [429, 430] [roma, greece] ongoing
52 123 bi lovers [542, 545] [bi for marketing, i love bi] (null)
52 256 modern history [500, 501] [france, germany] ongoing
52 200 arts [400, 401] [arts I, arts II] (null)
有人能帮我吗?
不确定我是否得到了正确的问题,但从我读到你做了合并,现在你正在寻找得到final result
?如果是这样,考虑到merged
是您合并的数据帧,这应该可以完成工作。
>>> merged.groupby(['user_id','code','name']).agg(list).reset_index()
user_id code name code_equivalence name_equivalence status
0 51 123 bi lovers [542, 545] [bi for marketing, i love bi] [ongoing, ongoing]
1 51 234 datascience [345, 555] [data and science, data lovers] [(null), (null)]
2 51 255 antiquity history [429, 430] [roma, greece] [(null), ongoing]
3 52 123 bi lovers [542, 545] [bi for marketing, i love bi] [(null), (null)]
4 52 200 arts [400, 401] [arts I, arts II] [(null), nan]
5 52 256 modern history [500, 501] [france, germany] [(null), ongoing]
这里是完整的解决方案,如果你只有df1
&df2
:
>>> (pd
...: .merge(df1,df2, left_on=['user_id','code','name'], right_on=['user_id','code','name'], how='left')
...: .groupby(['user_id','code','name'])
...: .agg(list)
...: .reset_index())
user_id code name code_equivalence name_equivalence status
0 51 123 bi lovers [542, 545] [bi for marketing, i love bi] [ongoing, ongoing]
1 51 234 datascience [345, 555] [data and science, data lovers] [nan, nan]
2 51 255 antiquity history [429, 430] [roma, greece] [nan, nan]
3 52 123 bi lovers [542, 545] [bi for marketing, i love bi] [nan, nan]
4 52 200 arts [400, 401] [arts I, arts II] [nan, nan]
5 52 256 modern history [500, 501] [france, germany] [nan, nan]
这就是我如何进入merge_dfDataFrame的三个步骤:
-
在第一条件下合并
-
第二条件合并
-
用步骤2中的匹配项填充步骤1中的缺失匹配项。
merge_df = pd.merge(df1, df2[["code","status"]], left_on=["code"], right_on=["code",], how="left") merge_df2 = pd.merge(df1, df2[["code","status"]], left_on=["code_equivalence"], right_on=["code",], how="left") merge_df["status"].fillna(merge_df2["status"], inplace=True)
但是我想知道是否有一行代码可以做到这一点(可能是)。