python中的单词级pos标记

我正在尝试为每行中的每个单词做POS标签（每行包含多个句子）。

我有此代码：

import nltk import pos_tag
import nltk.tokenize import word_tokenize
f = open('C:Userstest_data.txt')
data = f.readlines()
#Parse the text file for NER with POS Tagging
for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print entities
f.close()

但是代码给出了每行的标签，输出看起来像这样：

[（'公寓是全新的，清洁度的原始。Fatima Luas停下来。我喜欢这个地方。 njose和vadym非常热情，对我很好。运输。房间有点太小了，夫妇的缺乏，缺乏橱柜。 n notherwise非常干净且维护良好。，'nnp'）]

我的代码具有"令牌"，我不知道我的代码怎么了。我需要每个单词的POS标签，而不是每行。但是，仍然应通过括号或类似的东西将每一行分解（或区分）。

（我在计算机上运行的内容的纯复制糊）

运行代码（注意简单导入语句）：

#!/usr/bin/env python3
# encoding: utf-8
import nltk
f = open('/home/matthieu/Téléchargements/testtext.txt')
data = f.readlines()
for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print(entities)
f.close()

在以下Unicode原始文本文件（3行）上：

(this is a first example.)(Another sentence in another parentheses.)
(onlyone in that line)
this is a second one wihtout parenthesis. (Another sentence in another parentheses.)

我得到以下结果：

(S
(/(
this/DT
is/VBZ
a/DT
first/JJ
example/NN
./.
)/)
(/(
Another/DT
sentence/NN
in/IN
another/DT
parentheses/NNS
./.
)/))
(S (/( onlyone/NN in/IN that/DT line/NN )/))
(S
this/DT
...

如您所见，没有特定的问题。您是否正确解析CSV数据？CSV在您的情况下有用吗？您是否尝试使用简单的文本文件？

相关内容

最新更新

热门标签：