在数据框架中添加一个新列,其中新列是另一组与条件匹配的组的最小日期值



让我用一个示例解释这个问题,

def random_dates(start, end, n=10):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
np.random.seed(42)
d = pd.DataFrame({'AAA': np.random.choice(['a','b','c'],size=10),
                       'BBB': np.random.choice([0,3,1],size=10),
                       'CCC': random_dates(pd.to_datetime('2015-01-01'),pd.to_datetime('2018-01-01'),10)})

d看起来像这样

  AAA  BBB   CCC
0   c   1   2017-06-08 23:18:17
1   a   1   2015-12-18 06:26:04
2   c   0   2016-05-11 01:34:40
3   c   1   2017-06-18 21:35:54
4   a   3   2016-12-21 11:51:05
5   a   0   2017-02-05 07:36:31
6   c   3   2016-06-25 13:49:15
7   b   3   2015-09-06 19:49:15
8   c   3   2016-11-28 00:06:38
9   c   3   2017-07-12 02:51:41

现在,我想添加一个新列,即min_date_value_per_aaaa,其值应该像这样

j= d[d.BBB >0].groupby('AAA',as_index=False).CCC.min().rename(columns={"CCC":"min_CCC"})
    AAA min_CCC
0   a   2015-12-18 06:26:04
1   b   2015-09-06 19:49:15
2   c   2016-06-25 13:49:15

请注意,我不想将每组的最小日期价值拿走,这与

不同
d.groupby('AAA').CCC.min()
AAA
a   2015-12-18 06:26:04
b   2015-09-06 19:49:15
c   2016-05-11 01:34:40

现在,我可以与J一起加入D,并通过

将min_ccc添加到J
 pd.merge(d,j,on="AAA",how="inner")
AAA BBB CCC min_CCC
0   c   1   2017-06-08 23:18:17 2016-06-25 13:49:15
1   c   0   2016-05-11 01:34:40 2016-06-25 13:49:15
2   c   1   2017-06-18 21:35:54 2016-06-25 13:49:15
3   c   3   2016-06-25 13:49:15 2016-06-25 13:49:15
4   c   3   2016-11-28 00:06:38 2016-06-25 13:49:15
5   c   3   2017-07-12 02:51:41 2016-06-25 13:49:15
6   a   1   2015-12-18 06:26:04 2015-12-18 06:26:04
7   a   3   2016-12-21 11:51:05 2015-12-18 06:26:04
8   a   0   2017-02-05 07:36:31 2015-12-18 06:26:04
9   b   3   2015-09-06 19:49:15 2015-09-06 19:49:15

我的问题是有一种更好的方法来实现它。

我认为您需要transform

d.CCC.where(d.BBB>0).groupby(d.AAA).transform('min')
Out[58]: 
0   2016-06-25 13:49:15
1   2015-12-18 06:26:04
2   2016-06-25 13:49:15
3   2016-06-25 13:49:15
4   2015-12-18 06:26:04
5   2015-12-18 06:26:04
6   2016-06-25 13:49:15
7   2015-09-06 19:49:15
8   2016-06-25 13:49:15
9   2016-06-25 13:49:15
Name: CCC, dtype: datetime64[ns]
d['new']=d.CCC.where(d.BBB>0).groupby(d.AAA).transform('min')

最新更新