如何在脚本分割后将CSV标头保留在区块文件上



我需要帮助修改这个脚本,以便在输出文件块上包含头。脚本使用一些输入来确定进程将按每个文件拆分文件的行数。输出文件不包含原始文件的头。我正在寻求如何实施的建议。

import csv
import os
import sys

os_path = os.path
csv_writer = csv.writer
sys_exit = sys.exit

if __name__ == '__main__':
try:
chunk_size = int(input('Input number of rows of one chunk file: '))
except ValueError:
print('Number of rows must be integer. Close.')
sys_exit()
file_path = input('Input path to .tsv file for splitting on chunks: ')
if (
not os_path.isfile(file_path) or
not file_path.endswith('.tsv')
):
print('You must input path to .tsv file for splitting.')
sys_exit()
file_name = os_path.splitext(file_path)[0]
with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:
chunk_file = None
writer = None
counter = 1
reader = csv.reader(tsv_file, delimiter='t', quotechar=''')
for index, chunk in enumerate(reader):
if index % chunk_size == 0:
if chunk_file is not None:
chunk_file.close()
chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
counter += 1
writer = csv_writer(chunk_file, delimiter='t', quotechar=''')
print('File "{}" complete.'.format(chunk_name))
writer.writerow(chunk)

您可以在打开输入文件时手动读取标题行,然后将其写入每个输出文件的开头——请参阅下面代码中的ADDED注释:

...
with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:
chunk_file = None
writer = None
counter = 1
reader = csv.reader(tsv_file, delimiter='t', quotechar="'")
header = next(reader)  # Read and save header row.  (ADDED)
for index, chunk in enumerate(reader):
if index % chunk_size == 0:
if chunk_file is not None:
chunk_file.close()
chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
writer = csv_writer(chunk_file, delimiter='t', quotechar="'")
writer.writerow(header)  # ADDED.
print('File "{}" complete.'.format(chunk_name))
counter += 1
writer.writerow(chunk)

注意使用单引号字符进行引用意味着输出文件不符合CSV标准:RFC 4180

最新更新