如何修复此代码并制作自己的POS标签?(PYTHON)



我的程序需要读取一个包含句子的文件,并产生这样的输出:

输入:IxéMaria。输出:Ixé\PRON Maria\N-PR。

到目前为止,我写了这篇文章,但outfile给了我一个空的文本文件。(请给我建议(:

infile = open('corpus_test.txt', 'r', encoding='utf-8').read()
outfile = open('tag_test.txt', 'w', encoding='utf-8')
dicionario = {'mimbira': 'N',
'anama-itá': 'N-PL',
'Maria': 'N-PR',
'sumuara-kunhã': 'N-FEM',
'sumuara-kunhã-itá': 'N-FEM-PL',
'sapukaia-apigaua': 'N-MASC',
'sapukaia-apigaua-itá': 'N-MASC-PL',
'nhaã': 'DEM',
'nhaã-itá': 'DEM-PL',
'ne': 'POS',
'mukuĩ': 'NUM',
'muíri': 'QUANT',
'iepé': 'INDF',
'pirasua': 'A1',
'pusé': 'A2',
'ixé': 'PRON1',
'se': 'PRON2',
'. ;': 'PUNCT'
}
np_words = dicionario.keys()
np_tags = dicionario.values()
for line in infile.splitlines():
list_of_words = line.split()
if np_words in list_of_words:
tag_word = list_of_words.index(np_words)+1
word_tagged = list_of_words.insert(tag_word, f'{np_tags}') 
word_tagged = " ".join(word_tagged)
print(word_tagged, file=outfile)
outfile.close()

简单地从NLP开始,可以更容易地理解和欣赏更先进的系统。

这给出了你想要的:

# Use 'with' so that the file is automatically closed when the 'with' ends.
with open('corpus_test.txt', 'r', encoding='utf-8') as f:
# splitlines is not a method, readlines is.
# infile will contain a list, where each item is a line.
# e.g. infile[0] = line 1.
infile = f.readlines()
dicionario = {
'Maria': 'N-PR',
'ixé': 'PRON1',
}
# Make a list to hold the new lines
outlines = []
for line in infile:
list_of_words = line.split()

new_line = ''
# 'if np_words in list_of_words' is asking too much of Python.
for word in list_of_words:
# todo: Dictionaries are case-sensitive, so ixé is different to Ixé.
if word in dicionario:
new_line += word + '\' + dicionario[word] + ' '
else:
new_line += word + ' '
# Append the completed new line to the list and add a carriage return.
outlines.append(new_line.strip() + 'n')
with open('tag_test.txt', 'w', encoding='utf-8') as f:
f.writelines(outlines)

最新更新