我在同一个文件夹中有一些不同的csv文件,其结构如下:
CSV 1:
Name Passes Shots
1 Player 1 20 5
2 Player 2 30 6
3 Player 3 10 3
CSV 2:
Name Goals Duels
1 Player 3 2 3
2 Player 1 0 2
3 Player 2 1 7
csv3:
Name Country Age
1 Player 2 SPA 25
2 Player 3 SPA 26
3 Player 1 USA 23
我想将这个csv与panda组合在一个数据帧中,我想要的结果是:
Name Passes Shots Goals Duels Country Age
1 Player 1 20 5 0 2 USA 23
2 Player 2 30 6 1 7 SPA 25
3 Player 3 10 3 2 3 SPA 26
我试图将它们与此代码结合起来,但我得到了一个9行的数据帧(Player 1 3次,Player 2 3次和Player 3 3次(:
file_extension = ".csv"
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
df = pd.concat([pd.read_csv(file) for file in all_filenames])
我得到的结果是:
Name Passes Shots Goals Duels Country Age
1 Player 1 20 5 NaN NaN NaN NaN
2 Player 2 30 6 NaN NaN NaN NaN
3 Player 3 10 3 NaN NaN NaN NaN
1 Player 1 NaN NaN 0 2 NaN NaN
2 Player 2 NaN NaN 1 7 NaN NaN
3 Player 3 NaN NaN 2 3 NaN NaN
1 Player 1 NaN NaN NaN NaN USA 23
2 Player 2 NaN NaN NaN NaN SPA 25
3 Player 3 NaN NaN NaN NaN SPA 26
你知道如何按照我的意愿将它们组合起来,并以名称作为参考吗?提前感谢!
使用pandas合并并指定左右键
df1 = pd.DataFrame({
'Name': ['Player 1', 'Player 2', 'Player 3'],
'Passes': [20, 30, 10],
'Shots': [5, 6, 3]
})
df2 = pd.DataFrame({
'Name': ['Player 3', 'Player 1', 'Player 2'],
'Goals': [2, 0, 1],
'Duels': [3, 2, 7]
})
df3 = pd.DataFrame({
'Name': ['Player 2', 'Player 3', 'Player 1'],
'Country': ['SPA', 'SPA', 'USA'],
'Age': [25, 26, 23]
})
df1.merge(df2, left_on="Name", right_on="Name").merge(
df3, left_on="Name", right_on="Name")
输出:
Name Passes Shots Goals Duels Country Age
0 Player 1 20 5 0 2 USA 23
1 Player 2 30 6 1 7 SPA 25
2 Player 3 10 3 2 3 SPA 26
编辑1:
如果你有很多这样的文件,并且它们都有Name
作为密钥,那么你可以使用:
df = None
for f in list_of_files:
df = pd.read_csv(f) if df is None else df.merge(
pd.read_csv(f), left_on="Name", right_on="Name")
绕过drop_duplicates的一种方法就是合并多个列标题。这样,如果有一名球员在两支不同的球队打球,那么球员名称将是相同的,但球队将不同,合并将认识到这一点。
您也可以使用panda将CSV读取到数据帧中(因为您有很多要读取的内容,所以这可能是一个大的for循环(。
data1 = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
data2 = pd.read_csv(PATH_TO_CSV = '/your_second_csv.csv')
merged_df = pd.merge(data1, data2, how='outer', on=['player', 'team']
数据帧操作需要注意的另一点是,您可以在一个轴上将它们连接在一起,以便将它们组合在一起。它可以看起来像这样。。。
df_list = []
for loop {
data = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
# check type of data to make sure its a dataframe
df_list.append(data)
# and on to the next csv
}
# after the for loop you can concatenate the data frames together
concat_df = pd.concat(df_list, axis=0, sort=False)
# need to make sure the first df_list is iterable or else you will get an error
# then you can create a new csv with all of your data
concat_df.to_csv(OUTPUT_PATH + '/your_new_csv.csv', index=False)