为什么我的代码不遍历每一行？尽管剥离(.txt输入使用 .read()) 输入，但仍"n"存在一个关键错误

我有一个.txt的DNA序列文件，格式如下所示：

>seq1
ATATAT
>seq2
GGGGG
>seq3
TTTTT

使用 re.sub，我删除了带有">seq"数字的行，以便像这样输入 DNA 碱基 A、G、C 和 T 行，然后剥离""，如下所示：

ATATAT
GGGGG
TTTTT

import re
test = re.sub(' |1|2|3|4|5|6|7|8|9|0|>|s|e|q|:', "", holder)
newone = test.rstrip("n")

然后我想使用它作为我的一个热编码器的输入，它包含在 if 语句中(有一个 if 语句用于检查 DNA 序列中是否存在任何意外字母)。到目前为止，它看起来像这样：

for line in newone:
#if undesirable letters are present in sequence, an error message is displayed
if (bool(result)) == True:
print("The input sequence is invalid.")
#if sequence is in the correct format, proceed with one hot encoding
else:   
#mapping of bases to integers as a dictionary
bases = "ATCG"
base_to_integer = dict((i, c) for c, i in enumerate(bases))
#encoding input sequence as integers
integer_encoded = [base_to_integer[y] for y in newone]
#one hot encoding
onehot_encoded = list()
for value in integer_encoded:
base = [0 for x in range(len(bases))]
base[value] = 1
onehot_encoded.extend(base)
print(onehot_encoded)

我可以在 shell 中看到此错误，但我不确定为什么，因为我认为我已经在我的代码中指出需要剥离""：

Traceback (most recent call last):
File "C:UsersAgathaDocumentstesttt.py", line 38, in <module>
integer_encoded = [base_to_integer[y] for y in newone]
File "C:UsersAgathaDocumentstesttt.py", line 38, in <listcomp>
integer_encoded = [base_to_integer[y] for y in newone]
KeyError: 'n'

我希望我的程序在我的输入文件的每一行上迭代代码，但我不确定如果被剥离，我的 for 循环("newone 中的 for 行")是否会实现这一点，并且我无法弄清楚如何重新排列它以使其工作。我的目标是以这种格式输出，其中每个热编码序列都显示在单独的行上，例如：

[1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1]

我将不胜感激有关此错误的来源以及如何修复它的任何建议。

如果我是你，我会使用.split()方法从你阅读的文本创建一个列表。

test = re.sub(' |1|2|3|4|5|6|7|8|9|0|>|s|e|q|:', "", holder)
newone = test.split("n")

此时newone看起来像['', 'ATATAT', '', 'GGGGG', '', 'TTTTT', '']，因此去除多余的空格：

newone = [x for x in newone if x != '']

现在对于您收到的错误，这是因为您在列表理解(代码的第 38 行)中使用了newone而不是line.line的每个字母都是字典base_to_integer的键，但您得到的KeyError是因为n不是字典中的键。即使在进行了我上面建议的更改后，您也会收到错误：

KeyError: 'ATATAT'

因此，您应该将此行更改为：

integer_encoded = [base_to_integer[y] for y in line]

解决此问题可以让我：

[1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]
[0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]

希望这有帮助。

相关内容

最新更新

热门标签：