将TSV字符串的LISTA排序给定另一个列表B,该列表包包含Lista中的部分字符串



i有2个列表:ListA包含由Tabs和ListB界定的字符串,其中包含与ListA中的字符串部分匹配的字符串。我想通过在ListB中与ListB的部分字符串匹配ListA中的字符串。

中的ListA中的字符串。

我尝试的是在ListA上循环,用t将每一行分开,将第五列按_拆分,然后将字符串附加到临时ListC。然后,我订购了ListC,但我仍然不知道如何订购ListA给定的ListC

ListA = ['rs141130360tchr1:16495tCt653635tNC_024540.1tTranscripttintron_variant,non_coding_transcript_variantt-t-t-t-t-trs3210724tGtMODIFIERt-t-1t-tSNVtWASH7PtEntrezGenetHGNC:38034ttranscribed_pseudogenet-t-t-t-t-t-t-t-t-tRefSeqtGtGtOKt-t-t-t-t8/10t-t-tNR_024540.1:n.1080+112C>Gt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n',
         'rs141130360tchr1:16495tCt100287102tNR_046018.2tTranscripttdownstream_gene_variantt-t-t-t-t-trs3210724tGtMODIFIERt2086t1t-tSNVtDDX11L1tEntrezGenetHGNC:37102ttranscribed_pseudogenet-t-t-t-t-t-t-t-t-tRefSeqtGtGt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n',
         'rs141130360tchr1:16495tCt102466751tNG_106918.1tTranscripttdownstream_gene_variantt-t-t-t-t-trs3210724tGtMODIFIERt874t-1t-tSNVtMIR6859-1tEntrezGenetHGNC:50039tmiRNAt-t-t-t-t-t-t-t-t-tRefSeqtGtGt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n']
ListB = ["NC", "NG", "NM", "NP", "NR", "XM", "XP", "XR", "WP"]
ListC = []

for i in ListA:
    i_split = i.split("t")[4].split("_")[0]
    ListC.append(i_split)
ListC = sorted(ListC, key=lambda x: ListB.index(x))
print(ListC)    

将打印:

['NC', 'NG', 'NR']

我的预期结果如下:

['rs141130360tchr1:16495tCt653635tNC_024540.1tTranscripttintron_variant,non_coding_transcript_variantt-t-t-t-t-trs3210724tGtMODIFIERt-t-1t-tSNVtWASH7PtEntrezGenetHGNC:38034ttranscribed_pseudogenet-t-t-t-t-t-t-t-t-tRefSeqtGtGtOKt-t-t-t-t8/10t-t-tNR_024540.1:n.1080+112C>Gt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n',
'rs141130360tchr1:16495tCt102466751tNG_106918.1tTranscripttdownstream_gene_variantt-t-t-t-t-trs3210724tGtMODIFIERt874t-1t-tSNVtMIR6859-1tEntrezGenetHGNC:50039tmiRNAt-t-t-t-t-t-t-t-t-tRefSeqtGtGt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n', 
'rs141130360tchr1:16495tCt100287102tNR_046018.2tTranscripttdownstream_gene_variantt-t-t-t-t-trs3210724tGtMODIFIERt2086t1t-tSNVtDDX11L1tEntrezGenetHGNC:37102ttranscribed_pseudogenet-t-t-t-t-t-t-t-t-tRefSeqtGtGt-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-t-n']

i而不是将ListB转换为[value, index]字典,然后创建一个函数,从字符串中提取值并在DICT中查看。那将是我们的key函数sorted

d = {x: i for i, x in enumerate(ListB)}
def get_index(s):
    by_tabs = s.split('t')
    by_underscore = by_tabs[4].split('_')
    return d[by_underscore[0]]
listC = sorted(ListA, key=get_index)

最新更新