从Wikipedia一一获取电影部分



我试图从Wikipedia页面中获取电影情节和其他信息。我有电影标题和年份,从那以后,我必须找到准确的电影及其各自的情节,其他信息。

我正在使用wikipedia https://en.wikipedia.org/w/api.php?action=query = query&list=search& format = jsonfm&

我得到以下响应

{
"batchcomplete": "",
"continue": {
    "sroffset": 10,
    "continue": "-||"
},
"query": {
    "searchinfo": {
        "totalhits": 176
    },
    "search": [
        {
            "ns": 0,
            "title": "The Matrix",
            "pageid": 30007,
            "size": 123422,
            "wordcount": 12668,
            "snippet": "The <span class="searchmatch">Matrix</span> is a 1999 science fiction action film written and directed by the Wachowskis that stars Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss,",
            "timestamp": "2019-05-17T20:53:05Z"
        },

我需要搜索所有电影,而不仅仅是英语电影。我需要直接从搜索中获取绘图部分文本。

tl; dr

首先安装:

$ pip3 install imdbpy wikipedia

然后:

>>> import wikipedia
>>> from imdb import IMDb
>>> imdb = IMDb()
>>> imdb.search_movie('avengers')
[<Movie id:0848228[http] title:_The Avengers (2012)_>, <Movie id:0203247[http] title:_"Avengers: United They Stand" (1999)_>, <Movie id:2164490[http] title:_Avengers (1987) (VG)_>, <Movie id:4154796[http] title:_Avengers: Endgame (2019)_>, <Movie id:4154756[http] title:_Avengers: Infinity War (2018)_>, <Movie id:2395427[http] title:_Avengers: Age of Ultron (2015)_>, <Movie id:2455546[http] title:_"Avengers Assemble" (2013)_>, <Movie id:1626038[http] title:_"The Avengers: Earth's Mightiest Heroes" (2010)_>, <Movie id:0458339[http] title:_Captain America: The First Avenger (2011)_>, <Movie id:0118661[http] title:_The Avengers (1998)_>, <Movie id:0054518[http] title:_"The Avengers" (1961)_>, <Movie id:1355644[http] title:_Passengers (I) (2016)_>, <Movie id:8836988[http] title:_Avengement (I) (2019)_>, <Movie id:0473445[http] title:_Avenger (2006) (TV)_>, <Movie id:9426186[http] title:_Revenger (2018)_>, <Movie id:2378453[http] title:_Avenged (2013)_>, <Movie id:4296026[http] title:_Avengers Grimm (2015) (V)_>, <Movie id:0491703[http] title:_Ultimate Avengers (2006) (V)_>, <Movie id:0090190[http] title:_The Toxic Avenger (1984)_>, <Movie id:0056174[http] title:_The Avenger (1962)_>]
>>> title = imdb.search_movie('avengers')[0].data['title']
'The Avengers'
>>> wiki_page = wikipedia.page(title)
>>> wiki_page.url
'https://en.wikipedia.org/wiki/Avengers_(comics)'
>>> print(wiki_page.content)

请参阅:

  • https://pypi.org/project/wikipedia/
  • https://github.com/alberanid/imdbpy

最新更新