一次读取两行CSV,合并值,写入新的CSV



如何一次读取两行输入CSV,将一些值组合成一个新的单行,然后将该行写入一个新CSV?

在下面的输入中,我想读取两行,从第二行取price1,并在新的组合行中使其为price2,然后对接下来的两行重复:

输入

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03

新csv

date, name, qt, price1, price2
9/12/22, AB, 2, 5.00, 5.08
9/12/22, BC, 1, 2.00, 2.03
import csv
data = []
with open('test.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
data.append({'date': row[0],  'qt': row[3], 'name': row[5], 'price':  row[10]})
#data.append(myClass(row[0], row[2], row[3],  row[5],  row[10]))
for x in data:
print(x)

阿德里安的答案不见了:

  • 导入csv
  • #换行符需要python 3
  • csv必须为UTF-8,并且标头必须匹配
cat price_old.csv                                                                                                                                                         
date,name,qt,price1
9/12/22,AB,2,5.00
9/12/22,AB,2,5.08
9/12/22,BC,1,2.00
9/12/22,BC,1,2.03 
with open("price_old.csv") as old_csv:
new_list = []
dt, nm = None, None
c_reader = csv.DictReader(old_csv)
for row in c_reader:
if row['date'] != dt and row['name'] != nm:
dt, nm = row['date'], row['name']
price_list = [row['price1']]
else:
price_list.append(row['price1'])
row.update({'price1': price_list[0], 'price2': price_list[1]})
new_list.append(row)
price_list = []
dt, nm = None, None
with open('price_new.csv', 'w', newline='') as new_csv:
c_writer = csv.DictWriter(new_csv, fieldnames= ['date', 'name','qt', 'price1', 'price2'])
c_writer.writeheader()
c_writer.writerows(new_list)
cat price_new.csv
date,name,qt,price1,price2
9/12/22,AB,2,5.00,5.08
9/12/22,BC,1,2.00,2.03 

使用pandas将使您的生活更轻松。

import pandas as pd
df = pd.read_csv('test.csv', delimiter=', ')
# Group by the first 3 columns, and use the rows to form the columns (unstack)
df = df.groupby(['date','name','qt'])['price1'].apply(
lambda df: df.reset_index(drop=True)).unstack().reset_index()
# Rename the columns
df.columns = [*df.columns[:-2], 'price1', 'price2']
df.to_csv('output.csv', sep=',', index=False)

output.csv是:

date,name,qt,price1,price2
9/12/22,AB,2,5.0,5.08
9/12/22,BC,1,2.0,2.03

Pythons csv阅读器可以轻松地一次读取任意数量的行。

阅读器(csv.reader和csv.DictReader(都是迭代器,这意味着我们可以使用next((函数手动获取一行,就像我们可以使用elem = next(my_iterator)获取任何迭代器中的下一个元素一样:

import csv
import pprint
new_rows = []
with open("input.csv", newline="") as f:
reader = csv.reader(f)
header = next(reader)
new_rows.append(header + [" price 2"])
row1 = next(reader)
row2 = next(reader)
new_rows.append(row1 + row2[:-1])
row1 = next(reader)
row2 = next(reader)
new_rows.append(row1 + row2[:-1])
pprint.pprint(new_rows)

给我们:

[
['date',    ' name', ' qt', ' price1', ' price 2'],
['9/12/22',   ' AB',  ' 2',   ' 5.00',    ' 5.08'],
['9/12/22',   ' BC',  ' 1',   ' 2.00',    ' 2.03']
]

这种方法可以适用于使用for循环来驱动迭代。for循环将总是给我们";第一行";在我们想要的当前行组中。在循环中,我们要求与组中的行数一样多的下一行:

with open("input.csv", newline="") as f:
reader = csv.reader(f)
header = next(reader)
new_rows.append(header + [" price 2"])
for row1 in reader:
row2 = next(reader)
new_rows.append(row1 + row2[-1:])

这种方法假设您的输入有一个标题,然后行数是组大小的偶数倍。如果你的输入不符合这一点,你会得到一个StopIteration异常,看起来像(3.11(:

File "/Users/zyoung/develop/StackOverflow/./main.py", line 31, in <module>
row2 = next(reader)
^^^^^^^^^^^^
StopIteration

如果有可能,您可以添加异常处理:

for row1 in reader:
try:
row2 = next(reader)
except StopIteration as e:
print(f"encountered an odd row on line {reader.line_num}; stopping reading, moving on to writing")
break

最后,我们可以将这个想法扩展到逻辑组中的任意数量的行,比如每组3行:

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, AB, 2, 5.12
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03
9/12/22, BC, 1, 2.06
header = next(reader)
new_rows.append(header + [" price 2", " price 3"])
for row1 in reader:
row2 = next(reader)
row3 = next(reader)
new_rows.append(row1 + row2[-1:] + row3[-1:])

最新更新