如何基于第一列合并两个csv文件(NO HEADERS, NO PANDAS)



我有两个csv文件,我需要合并基于第一列(这是列0).我不能给他们头,我不能使用熊猫。以下是两个文件

StudentsMajorsList.csv

305671,Jones,Bob,Electrical Engineering,
987621,Wong,Chen,Computer Science,
323232,Rubio,Marco,Computer Information Systems,
564321,Awful,Student,Computer Science,Y
769889,Boy,Sili,Computer Information Systems,Y
156421,McGill,Tom,Electrical Engineering,
999999,Genius,Real,Physics,

GPAList.csv

156421,3.4
305671,3.1
323232,3.8
564321,2.2
769889,3.9
987621,3.85
999999,4

我希望生成的csv文件看起来像这样

FullRoster.csv

305671,Jones,Bob,Electrical Engineering,3.1
987621,Wong,Chen,Computer Science,3.85
323232,Rubio,Marco,Computer Information Systems,3.8
564321,Awful,Student,Computer Science,Y,2.2
769889,Boy,Sili,Computer Information Systems,Y,3.9
156421,McGill,Tom,Electrical Engineering,3.4
999999,Genius,Real,Physics,4

我可以使用什么代码来实现这一点。请记住,pandas是不允许的,为了方便起见,我不能给出文件头。我必须按原样使用它们。谢谢你!

编辑:我很抱歉我没有包括我的代码。我不经常使用这个引用,发帖前也不熟悉规则。我的道歉!以下是我目前为止的代码,但这段代码不起作用:

with open('StudentsMajorsList.csv','r') as f2:
reader = csv.reader(f2)
dict2 = {row[0]: row[1:] for row in reader}
with open('GPAList.csv','r') as f1:
reader = csv.reader(f1)
dict1 = OrderedDict((row[0], row[1:]) for row in reader)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in dict.items():
result.setdefault(key, []).extend(value)
with open('FullRoster.csv', 'w') as f:
w = csv.writer(f)
for key, value in result.items():
w.writerow([key] + value)
# usage: merge_csv.py <file1> <file2> <output>
# example: merge_csv.py file1.csv file2.csv file3.csv
import csv
import sys

def merge_csv(file1, file2, output):
with open(file1, 'r') as f1, open(file2, 'r') as f2, open(output, 'w') as f3:
reader1 = csv.reader(f1)
reader2 = csv.reader(f2)
writer = csv.writer(f3)
for row1, row2 in zip(reader1, reader2):
if row1[0] == row2[0]:
writer.writerow(row1 + row2[1:])
elif row1[0] < row2[0]:
writer.writerow(row1)
else:
writer.writerow(row2)

if __name__ == '__main__':
merge_csv(sys.argv[1], sys.argv[2], sys.argv[3])
print('done')

我是这样做的:

import csv
with open('StudentsMajorsList.csv', newline='') as file:
reader = csv.reader(file)
data1 = list(reader)

with open('GPAList.csv', newline='') as file:
reader = csv.reader(file)
data2 = list(reader)
merge1 = []
merge2 = []
merge3 = []
for list1 in data1:
for item in list1:
x = item.split(',')
merge1.append(x)

for list2 in data2:
for item in list2:
x = item.split(',')
merge2.append(x)

for i in range(len(merge1)):
for j in range(len(merge2)):
if(merge1[i][0] == merge2[j][0]):
merge3.append(merge1[i][0:])
merge3[i].append(merge2[j][1])
for item in merge3:
for i in item:
if (i == ''):
item.remove(i)

for item in range(len(merge3)):
print(merge3[item])

with open('FullRoster.csv', 'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(merge3)

输出:

['305671', 'Jones', 'Bob', 'Electrical Engineering', '3.1']
['987621', 'Wong', 'Chen', 'Computer Science', '3.85']
['323232', 'Rubio', 'Marco', 'Computer Information Systems', '3.8']
['564321', 'Awful', 'Student', 'Computer Science', 'Y', '2.2']
['769889', 'Boy', 'Sili', 'Computer Information Systems', 'Y', '3.9']
['156421', 'McGill', 'Tom', 'Electrical Engineering', '3.4']
['999999', 'Genius', 'Real', 'Physics', '4']

您需要学习如何使用Python内置的csv库,该库可以帮助您读取一行CSV值并将其转换为列表。

解决这个问题的方法是首先将GPAList值读入字典。这使得任何ID值都可以很容易地查找。

对于学生CSV中的每一行,在字典中查找所需的值,并将其附加到刚刚读取的行,同时将其写入输出CSV文件。

例如:

import csv
with open('GPAList.csv') as f_gpa:
csv_gpa = csv.reader(f_gpa)
gpa = dict(csv_gpa)

with open('StudentsMajorsList.csv') as f_students, open('FullRoster.csv', 'w', newline='') as f_roster:
csv_students = csv.reader(f_students)
csv_roster = csv.writer(f_roster)

for row in csv_students:
csv_roster.writerow([*row, gpa[row[0]]])

我建议你添加一些print语句来更好地理解它是如何工作的。例:print(gpa)

我想用四个步骤来解决这个问题

  1. 读取StudentsMajorsList.csv ->data ({row[0]: row})
  2. 删除
  3. 行中最后的空列
  4. 读取GPAList.csv并更新数据
  5. 写入FullRoster.csv
import csv
import pprint
# Step 1: Read StudentsMajorsList.csv into data
with open("StudentsMajorsList.csv") as stream:
reader = csv.reader(stream)
data = {row[0]: row for row in reader}
# Step 2: Remove those empty last columns
for row in data.values():
if row[-1] == "":
del row[-1]
# Step 3: read GPAList.csv and update data
with open("GPAList.csv") as stream:
reader = csv.reader(stream)
for student_id, gpa in reader:
if student_id in data:
data[student_id].append(gpa)
# Step 4: Write to FullRoster.csv
with open("FullRoster.csv", "w") as stream:
writer = csv.writer(stream)
writer.writerows(data.values())

注意:步骤2是为了使输出与预期的输出匹配,但它会导致数据不一致,这意味着一些行将有5列,而其他行将有6列。因此,如果需要保持数据一致,请删除步骤2。

相关内容

最新更新