这是代码。有人能发现错误吗?
file = open("WSJ_02-21.pos-chunk", 'r')
lines = file.readlines()
input_list = [[0 for j in range(5)] for i in range(len(lines))]
for i in range(len(input_list)):
input_line = lines[i].split("t")
if len(input_line) == 0:
for j in range(len(input_list[i])):
input_list[i][j] = ""
elif len(input_line) == 3:
for j in range(len(input_list[i])):
input_list[i][j] = input_line[i][j]
这是错误
Traceback (most recent call last):
File "C:/Users/inigo/PycharmProjects/NLPHW5/main.py", line 12, in <module>
input_list[i][j] = input_line[i][j]
IndexError: string index out of range
我的预期输出是一个包含元素WSJ_02-21.pos-chunk 的二维列表
输入文件的链接[https://drive.google.com/file/d/1QLMfD9HhvshhqE7XqIn96ML-M0j2uNLh/view?usp=sharing]
代码的目的并不完全清楚,但如果我理解正确,下面的代码似乎就是你想要实现的:
with open("WSJ_02-21.pos-chunk", 'r') as f:
input_list = []
for line in f:
input_line = line.strip().split('t')
if len(input_line) == 0:
input_list.append([''])
elif len(input_line) == 3:
input_list.append(input_line)
但是,你真的想为空行输入吗?
如果没有,以下可能会更好:
with open("WSJ_02-21.pos-chunk", 'r') as f:
input_list = []
for line in f:
input_line = line.strip()
if len(input_line) > 0:
input_list.append(input_line.split('t'))
如果您经过的行类似于:
lines = ["avdtbdctcdc"]
您的input_line
将有3个令牌(因此将在elif
中结束(,但您的input_list[i]
将长于5(您在input_list
的每行中施加的默认长度(,并且您将结束超出范围的
input_list[i][j] = input_line[i][j]
IndexError: string index out of range