我想根据游戏名称和年份为此制作一个唯一的ID。主要关注的是列名称。
我有多个文件:
Name Year Level
Pikachu 2007 30
Pikachu 2007 20
Raichu 2007 20
Mew 2007 35
Name Year Level
Pikachu 2008 50
Pikachu 2008 40
Raichu 2008 55
Mew 2008 55
Pokemon Year Level
Squirtle 2008 50
Pidgey 2008 40
Pidgey 2008 55
Ekans 2008 55
这是我想要的结果:
Game Name Year Level Id
Pokemon Pikachu 2007 30 1
Pokemon Pikachu 2007 20 1
Pokemon Raichu 2007 20 2
Pokemon Mew 2007 35 3
Pokemon Pikachu 2008 50 1
Pokemon Pikachu 2008 40 1
Pokemon Raichu 2008 55 2
Pokemon Mewtwo 2008 55 3
Pokemon Squirtle 2008 60 1
Pokemon Pidgey 2008 45 2
Pokemon Pidgey 2008 52 2
Pokemon Ekans 2008 51 3
我已经尝试过这个:
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
for i, p in enumerate(df['Pokemon'].unique(), 1):
df.loc[i-1,'id'] = i
df.loc[i-1, 'Pokemon'] = p
df['Id'] = df['Id'].astype('int')
我认为您希望每个DataFrame
factorize
,对于最终的大DataFrame
创建列表并最后通过concat
连接在一起:
out = []
for file in files:
df = pd.read_csv(file,header=0)
df['Game'] = 'Pokemon'
df['id'] = pd.factorize(df['Name'])[0] + 1
out.append(df)
df = pd.concat(out, ignore_index=True)
print (df)
Name Year Level Game id
0 Pikachu 2007 30 Pokemon 1
1 Pikachu 2007 20 Pokemon 1
2 Raichu 2007 20 Pokemon 2
3 Mew 2007 35 Pokemon 3
4 Pikachu 2008 50 Pokemon 1
5 Pikachu 2008 40 Pokemon 1
6 Raichu 2008 55 Pokemon 2
7 Mew 2008 55 Pokemon 3
8 Squirtle 2008 50 Pokemon 1
9 Pidgey 2008 40 Pokemon 2
10 Pidgey 2008 55 Pokemon 2
11 Ekans 2008 55 Pokemon 3