如何使用python增强我的数据集输出



我有两个输入文件,input.txtdatainput.txt。我检查input.txt的第二列是否与datainput.txt的第一列匹配,如果它们匹配,那么我将它的orthodb_id放在输出文件的最后相关行。

input.txt:

5 21 218
6 11 1931
7 26 173

datainput.txt:

>21|95|28|5
Computer
>11|28|5|5
Cate 

code.py:

import csv
with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())
with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                 output.writerow(row + [file1_data[key]])

这是我的代码得到的输出:

>21|95|28|5|5
>11|28|5|5|6

您只需在代码中添加else块即可获得所需的输出:

import csv
with open('input.txt', 'rb') as file1:
    file1_data = dict(line.split(None, 2)[1::-1] for line in file1 if line.strip())
with open('data.txt', 'rb') as file2, open('output.txt', 'wb') as outputfile:
    output = csv.writer(outputfile, delimiter='|')
    for line in file2:
        if line[:1] == '>':
            row = line.strip().split('|')
            key = row[0][1:]
            if key in file1_data:
                output.writerow(row + [file1_data[key]])
        else:
            outputfile.write(line)

最新更新