维基百科歧义消除错误

我最近一直在使用维基百科模块来确定一个随机的维基百科页面。

我一直在用一个非常大的单词列表来做这件事，随机选择（）模块是这样的：

words=open("words.txt","r")
words=words.read()
words=words.split()    
text=random.choice(words)
string=random.choice(wikipedia.search(text))
p = wikipedia.page(string)

该系统似乎经常工作，但偶尔会堵塞错误：

Traceback (most recent call last):
  File "/home/will/google4.py", line 25, in <module>
    p = wikipedia.page(string)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 276, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 299, in __init__
    self.__load(redirect=redirect, preload=preload)
  File "/usr/local/lib/python2.7/dist-packages/wikipedia/wikipedia.py", line 393, in __load
    raise DisambiguationError(getattr(self, 'title', page['title']), may_refer_to)
DisambiguationError: "The Scarf" may refer to: 
The Scarf (film)
The Scarf (opera)
Scarf (disambiguation)
Arthur Stewart King Scarf

有什么办法可以绕过这个吗？

您可以捕获DisambiguationError并随机选择其中一个页面。

try:
    p = wikipedia.page(string)
except wikipedia.DisambiguationError as e:
    s = random.choice(e.options)
    p = wikipedia.page(s)

请参见此处：http://wikipedia.readthedocs.io/en/latest/quickstart.html

更好的是，使用您可以使用的工具：

wikipedia.random(pages=1)
Get a list of random Wikipedia article titles.
Note
Random only gets articles from namespace 0, meaning no Category, User talk, or other meta-Wikipedia pages.
Keyword arguments:
    pages - the number of random pages returned (max of 10)

（来自https://wikipedia.readthedocs.io/en/latest/code.html#api)

一个显而易见的方法是下载一个完整的维基百科页面名称列表，并使用它来代替单词列表。这对维基百科的搜索引擎也会好得多，因为你不需要获得一个随机页面（此外，如果你想要一个统一的随机页面，你就不能使用搜索引擎）。

一个不太好但可能更容易的修复方法是，您只需尝试/排除消歧错误，然后重试。

尝试以下

p = wikipedia.page(string, auto_suggest=False, redirect=True, preload=False)

设置auto_suggest=False应该可以解决其中一个问题。

相关内容

最新更新

热门标签：