Pandas:将不同的CSV组合在一个df中,按名称组合



我在同一个文件夹中有一些不同的csv文件,其结构如下:

CSV 1:

Name      Passes     Shots
1   Player 1     20        5
2   Player 2     30        6
3   Player 3     10        3

CSV 2:

Name      Goals     Duels
1   Player 3     2        3
2   Player 1     0        2
3   Player 2     1        7

csv3:

Name      Country     Age
1   Player 2     SPA        25
2   Player 3     SPA        26
3   Player 1     USA        23

我想将这个csv与panda组合在一个数据帧中,我想要的结果是:

Name    Passes     Shots    Goals    Duels    Country    Age
1   Player 1     20        5       0        2        USA       23
2   Player 2     30        6       1        7        SPA       25
3   Player 3     10        3       2        3        SPA       26

我试图将它们与此代码结合起来,但我得到了一个9行的数据帧(Player 1 3次,Player 2 3次和Player 3 3次(:

file_extension = ".csv"
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
df = pd.concat([pd.read_csv(file) for file in all_filenames])

我得到的结果是:

Name    Passes     Shots    Goals    Duels    Country    Age
1   Player 1     20        5      NaN      NaN       NaN      NaN
2   Player 2     30        6      NaN      NaN       NaN      NaN
3   Player 3     10        3      NaN      NaN       NaN      NaN
1   Player 1    NaN      NaN       0        2        NaN      NaN
2   Player 2    NaN      NaN       1        7        NaN      NaN
3   Player 3    NaN      NaN       2        3        NaN      NaN
1   Player 1    NaN      NaN       NaN      NaN      USA       23
2   Player 2    NaN      NaN       NaN      NaN      SPA       25
3   Player 3    NaN      NaN       NaN      NaN      SPA       26

你知道如何按照我的意愿将它们组合起来,并以名称作为参考吗?提前感谢!

使用pandas合并并指定左右键

df1 = pd.DataFrame({
'Name': ['Player 1', 'Player 2', 'Player 3'],
'Passes': [20, 30, 10],
'Shots': [5, 6, 3]
})
df2 = pd.DataFrame({
'Name': ['Player 3', 'Player 1', 'Player 2'],
'Goals': [2, 0, 1],
'Duels': [3, 2, 7]
})
df3 = pd.DataFrame({
'Name': ['Player 2', 'Player 3', 'Player 1'],
'Country': ['SPA', 'SPA', 'USA'],
'Age': [25, 26, 23]
})
df1.merge(df2, left_on="Name", right_on="Name").merge(
df3, left_on="Name", right_on="Name")

输出:

Name    Passes  Shots   Goals   Duels   Country Age
0   Player 1        20     5    0          2    USA     23
1   Player 2        30     6    1          7    SPA     25
2   Player 3        10     3    2          3    SPA     26

编辑1:

如果你有很多这样的文件,并且它们都有Name作为密钥,那么你可以使用:

df = None
for f in list_of_files:
df = pd.read_csv(f) if df is None else df.merge(
pd.read_csv(f), left_on="Name", right_on="Name")

绕过drop_duplicates的一种方法就是合并多个列标题。这样,如果有一名球员在两支不同的球队打球,那么球员名称将是相同的,但球队将不同,合并将认识到这一点。

您也可以使用panda将CSV读取到数据帧中(因为您有很多要读取的内容,所以这可能是一个大的for循环(。

data1 = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
data2 = pd.read_csv(PATH_TO_CSV = '/your_second_csv.csv')
merged_df = pd.merge(data1, data2, how='outer', on=['player', 'team']

数据帧操作需要注意的另一点是,您可以在一个轴上将它们连接在一起,以便将它们组合在一起。它可以看起来像这样。。。

df_list = []
for loop {
data = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
# check type of data to make sure its a dataframe
df_list.append(data)
# and on to the next csv
}
# after the for loop you can concatenate the data frames together
concat_df = pd.concat(df_list, axis=0, sort=False)
# need to make sure the first df_list is iterable or else you will get an error
# then you can create a new csv with all of your data
concat_df.to_csv(OUTPUT_PATH + '/your_new_csv.csv', index=False)

最新更新