我想知道是否有人能为我指明正确的方向。
这是我的数据样本。
TRANS,"GUS000017787609","","","INSTL","","","","","","",,"","",20211025,
MTPNT,"",45654,"","","","","",,,
ASSET,"","INSTL","METER","","CR","G4SZV-2","FLN",2020,"XXXTYU422000","32","","LI"
我需要使用python以某种方式将此类信息转换为CSV。我有数千行数据,每个TRANS、MTPNT和ASSET都被认为是一个";行";。
有人知道在这种数据上预成型ETL的最佳技术类型是什么吗?
您可以使用grouper
配方一次读取3个CSV行并将它们组合。例如:
import csv
from itertools import zip_longest, chain
def grouper(iterable, n, fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for triple_row in grouper(csv_input, 3, ''):
row = list(chain.from_iterable(triple_row))
#row[2] = 'test' # modify 3rd value before writing
csv_output.writerow(row)
给你:
TRANS,GUS000017787609,,,INSTL,,,,,,,,,,20211025,,MTPNT,,45654,,,,,,,,,ASSET,,INSTL,METER,,CR,G4SZV-2,FLN,2020,XXXTYU422000,32,,LI