我生成此 NLTK 语法的方式有问题吗？

我用NLTK编写了这个简单的程序，它只是应该打印出语法树。但是，即使正在创建RecursiveDescentParser，它也不会打印任何内容。我的问题是什么？我是否定义语法不正确？我尝试遍历解析器的方式有问题吗？提前谢谢你。

import nltk
'''The price of peace is rising.'''
grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "is" | "rising"
NP -> Det N | Det N PP
Det -> "the" | "of"
N -> "price" | "peace"
P -> "in" | "on" | "by" | "with"
""")
sentence = "the price of peace is rising"
wordArray = sentence.split()
print(wordArray)
parser = nltk.RecursiveDescentParser(grammar)
for tree in parser.parse(wordArray):
print(tree)

首先，始终开始以一口大小编写语法。

让我们从一个简单的句子开始Peace is rising.

我们希望结构S -> NP VP，其中：

VP是一个不及物动词短语，在这种特殊情况下，is rising带有辅助is，rise带有-ing渐进式屈折。
NP只是一个名词。

[代码]：

import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N
N -> "peace"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))

[输出]：

[Tree('NP', [Tree('N', ['peace'])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

现在将行列式添加到带有 NPNP -> DT NP | N：

import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP  
N -> "peace" | "price" 
DT -> "the"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price is rising".split()
for tree in parser.parse(sentence):
print (list(tree))

[输出]：

[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

最后，我们可以简单地在NP中添加PP结构，NP -> NP PP和PP -> P NP：

import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP  
N -> "peace" | "price" 
DT -> "the"
PP -> P NP
P -> "of"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price of peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))

这为我们在顶级结果中提供了最好的解析。

[输出]：

[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

但它也伴随着一些令人讨厌的递归循环错误，看起来像这样：

File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 158, in __getitem__
return self[index[0]][index[1:]]
File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 156, in __getitem__
return self[index[0]]
File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 150, in __getitem__
if isinstance(index, (int, slice)):
RecursionError: maximum recursion depth exceeded in __instancecheck__

这是因为nltk.RecursiveDescentParser尝试递归查找解析，因为NP -> NP PP和PP -> P NP规则可以无限重复。如果你想知道为什么，试着把它作为一个单独的问题来问 StackOverflow ;P

一个简单的解决方案是使用try-except：

try:
for tree in parser.parse(sentence):
print (list(tree))
except RecursionError:
exit()

但那是丑陋的！相反，您可以使用ChartParser：

import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP  
N -> "peace" | "price" 
DT -> "the"
PP -> P NP
P -> "of"
""")
parser = nltk.ChartParser(your_grammar)
sentence = "the price of peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))

[输出]：

[Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

相关内容

最新更新

热门标签：