Python在合并数据帧时使用条件逻辑/ where



我有这些DFs

df1

user_id     code     name     code_equivalence             name_equivalence
51          123    bi lovers            542                bi for marketing
51          123    bi lovers            545                i love bi
51          234    datascience          345                data and science
51          234    datascience          555                data lovers
51          255    antiquity history    429                roma
51          255    antiquity history    430                greece
52          123    bi lovers            542                bi for marketing
52          123    bi lovers            545                i love bi
52          256    modern history       500                france
52          256    modern history       501                germany
52          200    arts                 400                arts I
52          200    arts                 401                arts II

df2

user_id     code     name       status
51          123    bi lovers    ongoing
51          430    greece       ongoing
52          501    germany      ongoing
52          050    numbers      ongoing

我想通过检查df2代码是否与df1代码或df1 code_equivalence相同来合并它们,df2名称与df1名称或df1 name_equivalence相同,以获得df2状态。这样的:

合并df

user_id     code     name               code_equivalence    name_equivalence        status
51          123    bi lovers            542                 bi for marketing        ongoing
51          123    bi lovers            545                 i love bi               ongoing
51          234    datascience          345                 data and science        (null)
51          234    datascience          555                 data lovers             (null)
51          255    antiquity history    429                 roma                    (null)
51          255    antiquity history    430                 greece                  ongoing
52          123    bi lovers            542                 bi for marketing        (null)
52          123    bi lovers            545                 i love bi               (null)
52          256    modern history       500                 france                  (null)
52          256    modern history       501                 germany                 ongoing
52          200    arts                 400                 arts I                  (null)
52          200    arts                 401                 arts II                 (null)

之后,我想把数据转换成一个新的df,看起来像这样:

最终df

user_id     code     name               code_equivalence    name_equivalence                    status
51          123    bi lovers            [542, 545]          [bi for marketing, i love bi]       ongoing
51          234    datascience          [345, 555]          [data and science, data lovers]     (null)
51          255    antiquity history    [429, 430]          [roma, greece]                      ongoing
52          123    bi lovers            [542, 545]          [bi for marketing, i love bi]       (null)
52          256    modern history       [500, 501]          [france, germany]                   ongoing
52          200    arts                 [400, 401]          [arts I, arts II]                   (null)

有人能帮我吗?

不确定我是否得到了正确的问题,但从我读到你做了合并,现在你正在寻找得到final result?如果是这样,考虑到merged是您合并的数据帧,这应该可以完成工作。

>>> merged.groupby(['user_id','code','name']).agg(list).reset_index()
user_id  code               name code_equivalence                 name_equivalence              status
0       51   123          bi lovers       [542, 545]    [bi for marketing, i love bi]  [ongoing, ongoing]
1       51   234        datascience       [345, 555]  [data and science, data lovers]    [(null), (null)]
2       51   255  antiquity history       [429, 430]                   [roma, greece]   [(null), ongoing]
3       52   123          bi lovers       [542, 545]    [bi for marketing, i love bi]    [(null), (null)]
4       52   200               arts       [400, 401]                [arts I, arts II]       [(null), nan]
5       52   256     modern history       [500, 501]                [france, germany]   [(null), ongoing]

这里是完整的解决方案,如果你只有df1&df2:

>>> (pd
...: .merge(df1,df2, left_on=['user_id','code','name'], right_on=['user_id','code','name'], how='left')
...: .groupby(['user_id','code','name'])
...: .agg(list)
...: .reset_index())
user_id  code               name code_equivalence                 name_equivalence              status
0       51   123          bi lovers       [542, 545]    [bi for marketing, i love bi]  [ongoing, ongoing]
1       51   234        datascience       [345, 555]  [data and science, data lovers]          [nan, nan]
2       51   255  antiquity history       [429, 430]                   [roma, greece]          [nan, nan]
3       52   123          bi lovers       [542, 545]    [bi for marketing, i love bi]          [nan, nan]
4       52   200               arts       [400, 401]                [arts I, arts II]          [nan, nan]
5       52   256     modern history       [500, 501]                [france, germany]          [nan, nan]

这就是我如何进入merge_dfDataFrame的三个步骤:

  1. 在第一条件下合并

  2. 第二条件合并

  3. 用步骤2中的匹配项填充步骤1中的缺失匹配项。

    merge_df = pd.merge(df1, df2[["code","status"]], left_on=["code"], right_on=["code",], how="left")
    merge_df2 = pd.merge(df1, df2[["code","status"]], left_on=["code_equivalence"], right_on=["code",], how="left")
    merge_df["status"].fillna(merge_df2["status"], inplace=True)
    

但是我想知道是否有一行代码可以做到这一点(可能是)。

最新更新