count nunique来自另一个数据帧



我的目标是添加一个列,该列计算dfB中使用dfA中的键的主键的唯一实例的数量。dfA的主键只出现一次,而dfB的主键会出现多次。

的例子。

dfA

<表类> 协议 Date_1 tbody> <<tr> 146108493 1/31/2019 142527722 1/9/2019

您可以在Agreementgroup数据帧dfB,并使用countmap聚合基于主键AgreementdfA的值:

dfA['Count'] = dfA['Agreement'].map(dfB.groupby('Agreement')['Date_2'].count())

或者像@Pygirl在评论中建议的那样使用value_countsmap:

dfA['Count'] = dfA['Agreement'].map(dfB['Agreement'].value_counts())

   Agreement     Date_1  Count
0  146108493  1/31/2019      3
1  142527722   1/9/2019      2

使用df.merge:

In [1161]: x = dfA.merge(dfB, indicator='Count').query('Count == "both"').groupby('Agreement').size().reset_index(name='Count')
In [1165]: dfA = dfA.merge(x)
In [1166]: dfA
Out[1166]: 
   Agreement     Date_1  Count
0  146108493  1/31/2019      3
1  142527722   1/9/2019      2

try:

x = dfa.merge(dfb, on='Agreement', how='right')
y = x.groupby('Agreement').size().reset_index(name='Count')
dfa = dfa.merge(y, on='Agreement')

一行:

dfa.join(dfb['Agreement'].groupby(dfb['Agreement']).count().rename('Count'), on='Agreement')

dfa:

    Agreement   Date_1  count
0   146108493   1/31/2019   3
1   142527722   1/9/2019    2
编辑:

尝试使用此函数过滤它们。

def grpdates(grp):
    stRow = grp.iloc[0]
    stRow1 = dfa[dfa['Agreement']==stRow.Agreement].Date_1
    grp = grp[np.array(grp.Date_2)<=np.array(stRow1 + pd.DateOffset(days=30))]
    return grp  
dfb['Date_2'] = pd.to_datetime(dfb.Date_2, format='%m/%d/%Y')
dfa['Date_1'] = pd.to_datetime(dfa.Date_1, format='%m/%d/%Y')
df = dfb.groupby('Agreement', group_keys=False).apply(grpdates)
dfa['Count'] = dfa['Agreement'].map(df['Agreement'].value_counts())

dfa:

    Agreement   Date_1      Count
0   146108493   2019-01-31  3.0
1   142527722   2019-01-09  NaN

相关内容

  • 没有找到相关文章

最新更新