通过读取txt文件创建列表元组的列表



我有一个文本文件,看起来像

EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O

Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER
BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O

我试图从这个文本中创建一个元组,稍后我将对它们进行评估。我想让list of list看起来像这样:

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

所有的空格表示所有的句子都应该添加到给定索引的列表中,然后在空格之后我们应该移动到列表的下一个索引来添加所有的句子。

# function to read data, return list of tuples each tuple represents a token contains word, pos tag, chunk tag, and ner tag
import csv
def read_data(filename) -> list:
data = []
sentences = []
with open(filename) as load_file:
reader = csv.reader(load_file, delimiter=" ")   # read

for row in reader:
if(len(tuple(row)) != 0):
data.append(tuple(row))

sentences.append(data)
return sentences

我有一个这样的函数但是它返回这个:

('EU', 'NNP', 'B-NP', 'B-ORG'),
('rejects', 'VBZ', 'B-VP', 'O'),
('German', 'JJ', 'B-NP', 'B-MISC'),
('call', 'NN', 'I-NP', 'O'),
('to', 'TO', 'B-VP', 'O'),
('boycott', 'VB', 'I-VP', 'O'),
('British', 'JJ', 'B-NP', 'B-MISC'),
('lamb', 'NN', 'I-NP', 'O'),
('.', '.', 'O', 'O'),
('Peter', 'NNP', 'B-NP', 'B-PER'),
('Blackburn', 'NNP', 'I-NP', 'I-PER'),
('BRUSSELS', 'NNP', 'B-NP', 'B-LOC'),
('1996-08-22', 'CD', 'I-NP', 'O'),

如何解决这个问题,我用两个不同的列表把它们加在一起但是我找不到方法。

我认为所有的问题都是因为你没有达到预期的结果

[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....
(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER),
(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)

但我认为你期望

[
[(EU, NNP,B-NP, B-ORG),(rejects, VBZ, B-VP, O),(German, JJ, B-NP, B-MISC),(call, NN, I-NP, O).....], 
[(Peter, NNP, B-NP, B-PER),(Blackburn, NNP, I-N,P I-PER)],
[(BRUSSELS, NNP, B-NP, B-LOC),(1996-08-22, CD, I-NP, O)],
]

和这个需要

for row in reader:
if row:
data.append(tuple(row))
else:
sentences.append(data)
data = []

最后可能还需要添加最后一个data,因为在这些数据

之后没有空行
if data:
sentences.append(data)

完整工作示例。

我使用io只是为了模拟内存中的文件,这样每个人都可以复制和运行它。但你应该使用open()而不是text

text = '''EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O
Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER
BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O'''
import csv
import io
data = []
sentences = []
#with open(filename) as load_file:
with io.StringIO(text) as load_file:    
reader = csv.reader(load_file, delimiter=" ")   # read

for row in reader:
if row:
data.append(tuple(row))
else:
sentences.append(data)
data = []
# add last data because there is no empty line after these data           
if data:
sentences.append(data)
print(sentences)