将CSV文件拆分为两个文件，在两个文件中保留头文件

我正试图将一个大型CSV文件拆分为两个文件。我用下面的代码

import pandas as pd
#csv file name to be read in
in_csv = 'Master_file.csv'
#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
#size of rows of data to write to the csv,
#you can change the row size according to your need
rowsize = 600000
#start looping through data writing it to a new file for each set
for i in range(0,number_lines,rowsize):
df = pd.read_csv(in_csv,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'File_Number' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=True,
mode='a',#append data to csv file
chunksize=rowsize)#size of data to append for each loop

正在分割文件，但在第二个文件中缺少头文件。我该如何修复它

.read_csv()与chunksize一起使用时返回一个迭代器，然后跟踪标头。示例如下:这应该要快得多，因为上面的原始代码读取整个文件来计算行数，然后在每个块迭代中重新读取之前的所有行;而下面的代码只读取文件一次:

import pandas as pd
with pd.read_csv('Master_file.csv', chunksize=60000) as reader:
for i,chunk in enumerate(reader):
chunk.to_csv(f'File_Number{i}.csv', index=False, header=True)

相关内容

最新更新

热门标签：