在一定条件下如何将by /Tag分组到同一组



我有一个如下所示的数据框架。此数据框包含名称和地址信息。

l_name   f_name   m_name    city_state     zip 
smith     sam                auburn,wa    98910
smith     sam       c        auburn,wa    98910
smith     sam                durham,nc    27659
smith     sam     eddie      auburn,wa    98910
smith     sam                auburn,wa    98910
:
:
aaron     joe     chris      ocala,fl     58910
aaron     joe                ocala,fl     58910
aaron     joe                durham,nc    27659
aaron     joe                ocala,fl     58910
aaron     joe       a        ocala,fl     58910

我试图创建一个代码,可以组/标记他们到同一组,如果他们的l_name, f_name, city_state,和zip列都匹配。

这是我希望的输出数据帧。

l_name   f_name   m_name    city_state     zip       tag
smith     sam                auburn,wa    98910       1
smith     sam       c        auburn,wa    98910       1
smith     sam                durham,nc    27659       0
smith     sam     eddie      auburn,wa    98910       1
smith     sam                auburn,wa    98910       1
clair     may                seattle,wa   98092       2
clair     may       a        seattle,wa   98092       2
clair     may                seattle,wa   98092       2
:
:
aaron     joe     chris      ocala,fl     58910       n
aaron     joe                ocala,fl     58910       n
aaron     joe                durham,nc    27659       0
aaron     joe                ocala,fl     58910       n
aaron     joe       a        ocala,fl     58910       n

如果该人不属于任何组,它将给其分配一个0。Sam Smith符合条件,所以他们都在同一组,即1。因此,标签/组2将是may clair, joe aaron将是n组,这取决于数据框中存储了多少个名字。

我想知道是否有任何好的方法,方法或建议我创建一个代码,可以输出我想要的结果上面?

谢谢!

你可以用>1计数并获得组号(和+1),然后对于计数为1的组,设置tag = 0。然后连接。

df1 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')>1].copy()
df1['tag'] = df1.groupby(['l_name', 'f_name', 'city_state', 'zip']).ngroup() + 1
df2 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')==1].copy()
df2['tag'] = 0
df_final = pd.concat([df1,df2])
df_final
l_name f_name m_name city_state    zip  tag
0  smith    sam      x  auburn,wa  98910    2
1  smith    sam      c  auburn,wa  98910    2
3  smith    sam  eddie  auburn,wa  98910    2
4  smith    sam      x  auburn,wa  98910    2
5  aaron    joe  chris   ocala,fl  58910    1
6  aaron    joe      x   ocala,fl  58910    1
8  aaron    joe      x   ocala,fl  58910    1
9  aaron    joe      a   ocala,fl  58910    1
2  smith    sam      x  durham,nc  27659    0
7  aaron    joe      x  durham,nc  27659    0

您可以使用ngroup的组合,然后计算其值:

# create a ID variable with a unique identifier for each group
groupvar = df.groupby(['l_name', 'f_name', 'city_state', 'zip']).ngroup(ascending=False) + 1
# assign it to the data 
df = df.assign(tag=groupvar)
df
l_name  f_name  m_name  city_state  zip     tag
0   smith   sam             auburn,wa   98910   2
1   smith   sam     c       auburn,wa   98910   2
2   smith   sam             durham,nc   27659   1
3   smith   sam     eddie   auburn,wa   98910   2
4   smith   sam             auburn,wa   98910   2
5   clair   may             seattle,wa  98092   3
6   clair   may     a       seattle,wa  98092   3
7   clair   may             seattle,wa  98092   3
8   aaron   joe     chris   ocala,fl    58910   4
9   aaron   joe             ocala,fl    58910   4
10  aaron   joe             durham,nc   27659   5
11  aaron   joe             ocala,fl    58910   4
12  aaron   joe     a       ocala,fl    58910   4
# determine which IDs occur only once and assign 0 to them
df.loc[df['tag'].value_counts()[df['tag']].values == 1, 'tag'] = 0
df
l_name  f_name  m_name  city_state  zip     tag
0   smith   sam             auburn,wa   98910   2
1   smith   sam     c       auburn,wa   98910   2
2   smith   sam             durham,nc   27659   0
3   smith   sam     eddie   auburn,wa   98910   2
4   smith   sam             auburn,wa   98910   2
5   clair   may             seattle,wa  98092   3
6   clair   may     a       seattle,wa  98092   3
7   clair   may             seattle,wa  98092   3
8   aaron   joe     chris   ocala,fl    58910   4
9   aaron   joe             ocala,fl    58910   4
10  aaron   joe             durham,nc   27659   0
11  aaron   joe             ocala,fl    58910   4
12  aaron   joe     a       ocala,fl    58910   4

相关内容

  • 没有找到相关文章

最新更新