用两列连接两个数组并删除不必要的部分 PYTHON



我想向你寻求帮助。首先,我想介绍一下我的问题。我有两个带有数组的文件,每个文件就像一个数组,行中的每个单词之间都有空格。

First: [9 columns] 3columns are important
2001   5276    data3   data4   data5   data6   data7   data8   data9
2001   23243   data3   data4   data5   data6   data7   data8   data9
....   
2001   434343  data3   data4   data5   data6   data7   data8   data9
2002   233     data3   data4   data5   data6   data7   data8   data9
....   
2002   23232   data3   data4   data5   data6   data7   data8   data9
Second:[5 columns] 
2001   23243   data3'   data4'   data5'
2001   5276    data3'   data4'   data5'   
....   
2001   434343  data3'   data4'   data5'   
2002   23232   data3'   data4'   data5'   
....      
2002   233     data3'   data4'   data5' 
I would like to create one file from two above which will contain array as ex.:
2001   5276    data3   data3'   data4'   data5'
2001   23243   data3   data3'   data4'   data5'
....

我必须检查每个文件中前两列中的数据是否相等,然后将它们相加:)到目前为止,我已经找到了这个程序,但我不知道如何以正确的方式更改它

file2 = open('file2', 'r')
matrix1 = [line.rstrip().split(' ') for line in file1.readlines()]
matrix2 = [line.rstrip().split(' ') for line in file2.readlines()]
file1.close()
file2.close()
#combine
t_matrix1 = [[r[col] for r in matrix1] for col in range(len(matrix1[0]))]
t_matrix2 = [[r[col] for r in matrix2] for col in range(len(matrix2[0]))]
final_t_matrix = []
for i in (t_matrix1 + t_matrix2):
    if i not in final_t_matrix:
        final_t_matrix.append(i)
final_matrix = [[r[col] for r in final_t_matrix] for col in    range(len(final_t_matrix[0]))]
#output
outfile = open('out.txt', 'w')
for i in final_matrix:               
    for j in i[:-1]:
        outfile.write(j+', ')
    outfile.write(i[-1]+'n')
outfile.close()

这里你想要的是一个字典,将每行的前两列从First映射到整行。这样,当您浏览Second时,您可以查找前两列,并附加到您在那里找到的行。

有几个问题需要回答,这些问题将准确确定哪种字典:

  • 行的顺序是否必须与它们在First中的顺序相同?
  • 如果Second中没有匹配的行,会发生什么情况 First
  • 反之亦然?
  • 如果任一文件中相同的前两列有多行怎么办?
让我们假设答案是"不,不可能发生,不可能发生

,不可能发生"。然后你可以使用一个简单的dict

with open('file1') as file1:
    lines = (line.rstrip().split() for line in file1)
    rows = {tuple(line[:2]): line[:3] for line in lines}
with open('file2') as file2:
    for line in file2:
        row = line.rstrip().split()
        rows[tuple(row[:2])].append(row[2:])
with open('out.txt', 'w') as outfile:
    for row in rows:
        outfile.write(', '.join(row) + 'n')

如果我更明确地拼写出来,第一部分对于新手来说可能更容易理解,所以让我这样做:

rows = {}
with open('file1') as file1:
    for line in file1:
        row = line.rstrip().split()
        first_two_columns = tuple(line[:2])
        first_three_columns = line[:3]
        rows[first_two_columns] = first_three_columns

我做了一些其他简化:

  • 使用 with 语句以避免调用 close
  • 不要使用readlines;一个文件已经是行的可迭代对象,你所做的只是让Python将整个文件读入内存,并在更多的内存中将其拆分为行,然后才能开始处理这些行。
  • split()在任何空白上运行上拆分,这可能是您在这里想要的,而不是split(' '),它只在空格字符上拆分。
  • ', '.join(i)给你i的所有成员,每对之间有', ',就像你对那个内循环所做的那样。
>>> f = open('FileA').readlines()
>>> f1 = open('FileB').readlines()
>>> for i in range(len(f)):
...     x=f[i].strip().split()
...     for j in range(len(f)):
...         y=f1[j].strip().split()
...         if x[0] == y[0] and x[1]== y[1]:
...             print x[0],x[1],x[2]," ".join(y[2:])
...
2001 5276 data3 data3' data4' data5'
2001 23243 data3 data3' data4' data5'
2001 434343 data3 data3' data4' data5'
2002 233 data3 data3' data4' data5'
2002 23232 data3 data3' data4' data5'

我已经打印了,您可以写入文件

file1 = open('file1', 'r')
file2 = open('file2', 'r')
rows = 0
finalfile =  None
for lineno, line in enumerate(file1):
    row1 = line.rstrip().split()
    first_column1 = row1[0]
    second_column1 = row1[1]
    #print(str(first_two_columns1)+ " "+ str(first_three_columns1)+ "n")
    for lineno, line in enumerate(file2):
         row2 = line.rstrip().split()
         first_column2 = row2[0]
         second_column2 = row2[1]
         #print(str(first_two_columns1)+ " "+ str(first_two_columns2)+ "n")
         if(float(first_column1) == float(first_column2)) and (second_column1 ==    second_column2):
            new_line = row1[0] + " " + row1[1] + " " + row1[2] + " " + row2[2] + " " + row2[3] + "n"
            rows = new_line             
        final_filename = 'final_file_{}.txt'.format(row1[0])
        finalfile = open(final_filename, "w")
     finalfile.write(line)
 if finalfile:
     finalfile.close()
file1.close()
file2.close()

Abarnet 感谢您的建议,多亏了它,我开发了我的脚本:)我有一个问题,因为我的程序创建数组,但它始终是同一行:)如何修复它

最新更新