我有5个数据集,作为CSV文件,它们每个都包含计算机上的事件日志,星期一到星期五。
:
Monday.csv
Tuesday.csv
Wednesday.csv
Thursday.csv
Friday.csv
我想知道如何将所有这些合并到一个大文件中,每个数据集的格式都相同,有80列,以及一周中的哪一天,当查看这个包含所有5天的大数据集时。
所以所有5个csv会变成一个大一点的,像
Week1.csv
这对熊猫来说可能吗?还是我需要另一个图书馆?
导入多个csv文件到pandas中,并将其连接到一个DataFrame中。
我的CSV文件包括第一行作为标题,当我合并它们时,它包括相同的标题5次通过文档时,pdf的合并,有没有办法从每个你合并它们之前删除第一列?
这个怎么样?
import pandas as pd
import glob
path = r'C:your_path_here' # use your path
all_files = glob.glob(path + "/*.csv")
# create list to append to
li = []
# loop through file names in the variable named 'all_files'
for filename in all_files:
df = pd.read_csv(filename, index_col=None, skiprows=1, header=o)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
注意:pd.read_csv
有skiprows=1
的参数
查看此链接
https://www.listendata.com/2019/06/pandas-read-csv.html
检查参数" header ";在熊猫。read_csv:用作列名的行号,以及数据的开始
如果您已经在使用Pandas,则可以使用它:
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
dfs = (pd.read_csv(day + '.csv').assign(Weekday=day) for day in days)
pd.concat(dfs).to_csv('Week.csv')
但是如果没有Pandas也可以做到,因为csv文件是纯文本文件,您只需要添加列(并且只保留一个标题)。假设分隔符是逗号(,
):
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
# extract one header
with open('Monday.csv', 'b') as fd:
header = 'Weekday,' + next(fd)
with open('Week.csv', 'w') as fdout:
fdout.write(header) # write the new header
for day in days: # loop over the days
with open(day + '.csv') as fdin:
_ = next(fdin) # skip header
for line in fdin: # and copy other lines
fdout.write(day + ',' + line)
您不需要pandas
或任何解析CSV文件的东西。只需使用fileinput.input
:
import fileinput
files = ('Monday.csv', 'Tuesday.csv', 'Wednesday.csv', 'Thursday.csv', 'Friday.csv')
with fileinput.input(files=files) as infile, open('Week1.csv', 'w') as outfile:
for line in infile:
if fileinput.isfirstline() and fileinput.filename() != files[0]:
continue # skip the CSV header line of all files except the first
print(line, end='', file=outfile)