在数据框中以a-b关系标记重复id



我试图在数据框中创建重复ID之间的关系。以91为例,91重复了4次对于前91项first列行值将更新为A第二将更新为B然后对于91行的下一行,首先将更新为B第二次更新为C那么next first将是C第二个是D所有重复的ID都会有相同的关系。对于不重复的ID,首先将标记为A.

0 00000000000

您可以使用每个组的cumcount作为源来执行映射:

from string import ascii_uppercase
# mapping dictionary
# this is an example, you can use any mapping
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C'...}
g = df.groupby('id')
c = g.cumcount()
m = g['id'].transform('size').gt(1)
df['first'] = c.map(d)
df.loc[m, 'other'] = c[m].add(1).map(d)

输出:

id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D

给定:

id
0   12
1    9
2   91
3   91
4   91
5   91
6   15
7   15
8   12
9    1
10   1
11   1

做:

# Count ids per group
df['first'] = df.groupby('id').cumcount()
# convert to letters and make other col
m = df.groupby('id').filter(lambda x: len(x)>1).index
df.loc[m, 'other'] = df['first'].add(66).apply(chr)
df['first'] = df['first'].add(65).apply(chr)
# fill in missing with 0
df['other'] = df['other'].fillna(0)

输出:

id first other
0   11     A     0
1    9     A     0
2   91     A     B
3   91     B     C
4   91     C     D
5   91     D     E
6   15     A     B
7   15     B     C
8   12     A     0
9    1     A     B
10   1     B     C
11   1     C     D

最新更新