按列组合多个csv文件,并将标题作为文件名



我在平台上找不到确切的代码,这就是为什么发布它以征求建议。

我有多个CSV文件(大约100个(具有相同的数据格式和标题名称。

,Mean,SD
1,96.432,13.899
2,96.432,13.899
3,96.432,13.899
4,96.432,13.899
5,96.432,13.899

我想按列追加所有文件,以便将它们放在一个文件中。此外,每个数据的标题应该是文件名,这样我就可以跟踪哪些数据属于哪个文件。例如,上述平均值,sd->文件名的另一行。

请指导我,因为我是Python的新手。

谢谢你,并致以问候,Khan。

关于格式化的问题很模糊,因此这可能与所需的输出不同。

filenames = [...]
dfs = []
for f in filenames:
newdf = pd.read_csv(f)
newdf.rename(columns={'Mean': 'Mean ' + f, 'SD': 'SD ' + f})
dfs.append(newdf)
df = pd.concat(dfs)

您可以使用pandas来读取和连接文件,以及glob和字典理解:

from glob import glob
import pandas as pd
files = glob('/tmp/*.csv') # change the location/pattern accordingly
# if you have a list of files, use: files=['file1.csv', 'file2.csv'...]
df = pd.concat({fname.rsplit('/')[-1]: pd.read_csv(fname, index_col=0)
for fname in files}, axis=1)

输出:

>>> print(df)
file1.csv         file2.csv        
Mean      SD      Mean      SD

1    96.432  13.899    96.432  13.899
2    96.432  13.899    96.432  13.899
3    96.432  13.899    96.432  13.899
4    96.432  13.899    96.432  13.899
5    96.432  13.899    96.432  13.899

保存到新文件:

df.to_csv('concatenated_file.csv')

输出:

,file1.csv,file1.csv,file2.csv,file2.csv
,Mean,SD,Mean,SD
,,,,
1,96.432,13.899,96.432,13.899
2,96.432,13.899,96.432,13.899
3,96.432,13.899,96.432,13.899
4,96.432,13.899,96.432,13.899
5,96.432,13.899,96.432,13.899

您可以使用panda来处理

In [3]: import pandas  
                                                     
In [4]: import pandas as pd                                                                                                                           
In [13]: ls                                                                                                                                           
abc1.csv  abc.csv
In [14]: df = pd.read_csv('abc.csv')                                                                                                                  
In [15]: df1 = pd.read_csv('abc1.csv')                                                                                                                
In [16]: df                                                                                                                                           
Out[16]: 
Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899

In [16]: df                                                                                                                                           
Out[16]: 
Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899
In [17]: df1                                                                                                                                          
Out[17]: 
Mean      SD
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899
In [18]: df.append(df1)                                                                                                                               
Out[18]: 
Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899
In [19]: ds = df.append(df1)                                                                                                                          
In [20]: ds                                                                                                                                           
Out[20]: 
Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899

In [21]: ds.to_csv('file1.csv')  

In [23]: ls                                                                                                                                           
abc1.csv  abc.csv  file1.csv

处理多个文件

In [82]: import  pandas as pd                                                                                                                         
In [83]: import os, glob                                                                                                                              
In [84]: s = glob.glob(os.path.join(os.getcwd(),'*.csv'))                                                                                             
In [85]: s                                                                                                                                            
Out[85]: 
['/home/thinkpad/Desktop/stackoverflow/abc1.csv',
'/home/thinkpad/Desktop/stackoverflow/abc.csv']

In [90]: df = pd.DataFrame(columns = ['in','Mean','SD']) 
...: for i in s: 
...:     df1 = pd.read_csv(i) 
...:     print(df1.head()) 
...:     df = df.append(df1) 
In [91]: df                                                                                                                                           
Out[91]: 
in    Mean      SD
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899
0  1  96.432  13.899
1  2  96.432  13.899

最新更新