我有大约1000+ csv需要水平合并。这是我的代码:
import os
import glob
import pandas as pd
dirname = r'path'
os.listdir(dirname)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
dflist = []
for file in all_filenames:
df = pd.read_csv(dirname+file, header=None, sep='n')
print(df)
df = df[0].str.split(',', expand=True)
dflist.append(df)
result = pd.concat(dflist, axis=1)
file_name = r'newfilenamepath'
result.to_csv(file_name)
问题是这些数据包括"Bob's Company, ltd"等项目。它最终会被分成两栏:"鲍勃公司";和";Ltd"因为我是根据逗号分割的。使用除逗号以外的任何分隔都会导致一些非常怪异的格式。所涉及的csv没有相同的标题、列数或行数。我只是想把它们放在一起。
如果它是相关的,我设法编写代码来垂直合并它们,可能有一个简单的编辑,我需要使合并它们水平而不是:
Dir = r'path'
files = os.listdir(Dir)
file_name = 'mergedcsvfilename'
with open(file_name + '.csv','w') as wf:
for file in files:
if '.DS_Store' not in file:
with open(Dir + file) as rf:
for line in rf:
if line.strip(): # if line is not empty
if not line.endswith("n"):
line+="n"
wf.write(line)
将它们横向合并而不使用pandas
Dir = r'path'
files = [ open(f.name,"r") for f in os.scandir(Dir) if f.is_file() and '.DS_Store' not in f.name ]
with open(file_name + '.csv','w') as wf:
while True:
r = ','.join([f.readline().rstrip('n') for f in files])
if not r.rstrip(','): break
wf.write(r)
map(lambda f: f.close(), files)
假设所有文件具有相同的行数和相同的列数