我的目标是添加一个列,该列计算dfB中使用dfA中的键的主键的唯一实例的数量。dfA的主键只出现一次,而dfB的主键会出现多次。
的例子。
dfA
<表类>
协议
Date_1
tbody> <<tr> 146108493 1/31/2019 142527722 1/9/2019 表类>
您可以在Agreement
上group
数据帧dfB
,并使用count
和map
聚合基于主键Agreement
的dfA
的值:
dfA['Count'] = dfA['Agreement'].map(dfB.groupby('Agreement')['Date_2'].count())
或者像@Pygirl在评论中建议的那样使用value_counts
和map
:
dfA['Count'] = dfA['Agreement'].map(dfB['Agreement'].value_counts())
Agreement Date_1 Count
0 146108493 1/31/2019 3
1 142527722 1/9/2019 2
使用df.merge
:
In [1161]: x = dfA.merge(dfB, indicator='Count').query('Count == "both"').groupby('Agreement').size().reset_index(name='Count')
In [1165]: dfA = dfA.merge(x)
In [1166]: dfA
Out[1166]:
Agreement Date_1 Count
0 146108493 1/31/2019 3
1 142527722 1/9/2019 2
try:
x = dfa.merge(dfb, on='Agreement', how='right')
y = x.groupby('Agreement').size().reset_index(name='Count')
dfa = dfa.merge(y, on='Agreement')
一行:
dfa.join(dfb['Agreement'].groupby(dfb['Agreement']).count().rename('Count'), on='Agreement')
dfa:
Agreement Date_1 count
0 146108493 1/31/2019 3
1 142527722 1/9/2019 2
编辑:尝试使用此函数过滤它们。
def grpdates(grp):
stRow = grp.iloc[0]
stRow1 = dfa[dfa['Agreement']==stRow.Agreement].Date_1
grp = grp[np.array(grp.Date_2)<=np.array(stRow1 + pd.DateOffset(days=30))]
return grp
dfb['Date_2'] = pd.to_datetime(dfb.Date_2, format='%m/%d/%Y')
dfa['Date_1'] = pd.to_datetime(dfa.Date_1, format='%m/%d/%Y')
df = dfb.groupby('Agreement', group_keys=False).apply(grpdates)
dfa['Count'] = dfa['Agreement'].map(df['Agreement'].value_counts())
dfa:
Agreement Date_1 Count
0 146108493 2019-01-31 3.0
1 142527722 2019-01-09 NaN