IndexError:如果矢量中的单词[i+1]超出范围,则列出索引:对于范围(len(vector))中的c



我已经编写了一些python代码,使用单词频率列表对文本语料库中的数据进行矢量化。我得到了一个IndexError:列表索引超出范围的错误,我的代码的这一行;

if words[i+1] in vector:
for c in range(len(vector)):

我正在创建一个代码来从词频列表中矢量化数据语料库,完整的代码如下:

# Process the original data, use a sliding window, convert the original data into vector form, and generate training samples.
def loadData():
# Processing raw data
data1 = open(r"ChinaCorpus.txt", 'r').read()
data1 = data1.replace('[', '')
data1 = data1.replace(']', '')
words = data1.split()
i = 0
while i < len(words):           # Remove the previous date factor and reduce its impact on parameter adjustment
if "1998" in words[i]:
del words[i]
i = i - 1
i += 1
lables = []
print(len(words))
for i in range(len(words)):
t = words[i].find('/')
lables.append(words[i][t:])
words[i] = words[i][0:t]
data2 = open(r"WordFreq.txt", "r",encoding= 'UTF-8').read()
vector = data2.split()
with open("VectorData.txt", "a", encoding='utf-8') as f:
for i in range(1,len(words)):
flag = 0
s = ""
if words[i-1] in vector:
for a in range(len(vector)):
if words[i-1] == vector[a]:
s += str(a)
s += ' '
break
else:
s += '0 '
if words[i] in vector:
for b in range(len(vector)):
if words[i] == vector[b]:
s += str(b)
s += ' '
if lables[i] == "/ns":
flag = 1
break
else:
s += '0 '
if words[i+1] in vector:
for c in range(len(vector)):
if words[i+1] == vector[c]:
s += str(c)
s += ' '
break
else:
s += '0 '
if flag == 1:
s += '1'
else:
s += '0'
s += 'n'
print(i)
f.write(s)
f.close()
if __name__ == '__main__':
loadData()

错误的屏幕截图:

错误

如果矢量包含3个值,计数器将遍历0、1和2。如果你从0中减去1,得到-1,在这种情况下不能使用。

我的建议是把它改成

for i in [i for i in words if i in vector]:
blabla

这会迭代word中的值,以防它也出现在vector中。如果你想比较一个单词在每个列表中的位置,你的方法当然效果更好。

相关内容

最新更新