创建唯一的 Id,在读取多个文件时枚举不同的行值



我想根据游戏名称和年份为此制作一个唯一的ID。主要关注的是列名称。

我有多个文件:

Name   Year    Level   
Pikachu   2007    30
Pikachu   2007    20
Raichu    2007    20
Mew       2007    35

Name   Year    Level   
Pikachu   2008    50
Pikachu   2008    40
Raichu    2008    55
Mew       2008    55


Pokemon   Year    Level   
Squirtle  2008    50
Pidgey    2008    40
Pidgey    2008    55
Ekans     2008    55

这是我想要的结果:

Game     Name   Year    Level    Id 
Pokemon   Pikachu   2007    30     1
Pokemon   Pikachu   2007    20     1 
Pokemon   Raichu    2007    20     2
Pokemon    Mew      2007    35     3
Pokemon   Pikachu   2008    50     1
Pokemon   Pikachu   2008    40     1
Pokemon   Raichu    2008    55     2
Pokemon   Mewtwo    2008    55     3
Pokemon   Squirtle  2008    60     1
Pokemon    Pidgey   2008    45     2
Pokemon    Pidgey   2008    52     2
Pokemon    Ekans    2008    51     3

我已经尝试过这个:

for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
for i, p in enumerate(df['Pokemon'].unique(), 1):
df.loc[i-1,'id'] = i
df.loc[i-1, 'Pokemon'] = p
df['Id'] = df['Id'].astype('int')

我认为您希望每个DataFramefactorize,对于最终的大DataFrame创建列表并最后通过concat连接在一起:

out = []
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
df['id'] = pd.factorize(df['Name'])[0] + 1
out.append(df)
df = pd.concat(out, ignore_index=True)
print (df)
Name  Year  Level     Game  id
0    Pikachu  2007     30  Pokemon   1
1    Pikachu  2007     20  Pokemon   1
2     Raichu  2007     20  Pokemon   2
3        Mew  2007     35  Pokemon   3
4    Pikachu  2008     50  Pokemon   1
5    Pikachu  2008     40  Pokemon   1
6     Raichu  2008     55  Pokemon   2
7        Mew  2008     55  Pokemon   3
8   Squirtle  2008     50  Pokemon   1
9     Pidgey  2008     40  Pokemon   2
10    Pidgey  2008     55  Pokemon   2
11     Ekans  2008     55  Pokemon   3

最新更新