这是我第一次在Stackoverflow上发帖。当我试图在jupyter笔记本中执行此代码片段时,我遇到了这个导入错误:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import string
import pke
import traceback
def get_nouns_multipartite(content):
out=[]
try:
extractor = pke.unsupervised.MultipartiteRank()
extractor.load_document(input=content)
# not contain punctuation marks or stopwords as candidates.
pos = {'PROPN','NOUN'}
#pos = {'PROPN','NOUN'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
stoplist += stopwords.words('english')
extractor.candidate_selection(pos=pos, stoplist=stoplist)
# 4. build the Multipartite graph and rank candidates using random walk,
# alpha controls the weight adjustment mechanism, see TopicRank for
# threshold/method parameters.
extractor.candidate_weighting(alpha=1.1,
threshold=0.75,
method='average')
keyphrases = extractor.get_n_best(n=15)
for val in keyphrases:
out.append(val[0])
except:
out = []
traceback.print_exc()
return out
我得到了错误:importterror:不能从'pke.readers'导入名称'SpacyDocReader'
我尝试将pke版本降级(到1.8.1),但随后出现KeyError 'hinglish'。
---> 29 get_alpha_2 = lambda l: LANGUAGE_CODE_BY_NAME[l]
30
31 lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
KeyError: 'hinglish'
我从来没有使用过pke库,所以我很困惑。我将感谢所有的支持!非常感谢!
我通过下载最新版本的pke解决了这个错误。我安装pke (Python关键字提取模块)模块从github使用以下代码:
pip install git+https://github.com/boudinfl/pke.git
此外,我通过在load_document函数调用期间加载文档并指定停止列表,对上述代码进行了必要的修改。
参考https://boudinfl.github.io/pke/build/html/unsupervised.html
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import string
import pke
import traceback
def get_nouns_multipartite(content):
out=[]
try:
extractor = pke.unsupervised.MultipartiteRank()
# not contain punctuation marks or stopwords as candidates.
pos = {'PROPN','NOUN'}
#pos = {'PROPN','NOUN'}
stoplist = list(string.punctuation)
stoplist += ['-lrb-', '-rrb-', '-lcb-', '-rcb-', '-lsb-', '-rsb-']
stoplist += stopwords.words('english')
extractor.load_document(input=content,stoplist=stoplist)
extractor.candidate_selection(pos=pos)
# 4. build the Multipartite graph and rank candidates using random walk,
# alpha controls the weight adjustment mechanism, see TopicRank for
# threshold/method parameters.
extractor.candidate_weighting(alpha=1.1,
threshold=0.75,
method='average')
keyphrases = extractor.get_n_best(n=15)
for val in keyphrases:
out.append(val[0])
except:
out = []
traceback.print_exc()
return out