我正在从事一个研究出版物和合作项目,该项目具有文献搜索功能。Google Scholar似乎可以工作,因为它是一个开源工具,但当我研究Google Scholars时,我找不到任何关于它有API的信息。
谷歌学者有API吗?
没有官方的Google Scholar API。
有第三方解决方案,如免费的scholarly
Python包,它支持概要文件、作者、引用和有机结果(search_pubs
似乎是获得有机结果的方法,尽管方法名称让我感到困惑(。
请注意,在没有请求率限制的情况下不断使用scholarly
,谷歌可能会屏蔽您的IP(@RadioControlled提到(。明智地使用它。
此外,还有一个scrape-google-scholar-py
模块,可以让你提取几乎所有的谷歌学者页面。
或者,SerpApi提供了一个Google Scholar API,这是一个付费的API,有一个免费计划,支持有机、引用、简介、作者结果,并绕过SerpApi后端的所有区块,这样它就不会阻止你的IP,并处理合法的刮擦部分。
使用scholarly
和search_by_keyword
方法解析配置文件结果的示例代码:
import json
from scholarly import scholarly
# will paginate to the next page by default
authors = scholarly.search_keyword("biology")
for author in authors:
print(json.dumps(author, indent=2))
# part of the output:
'''
{
"container_type": "Author",
"filled": [],
"source": "SEARCH_AUTHOR_SNIPPETS",
"scholar_id": "LXVfPc8AAAAJ",
"url_picture": "https://scholar.google.com/citations?view_op=medium_photo&user=LXVfPc8AAAAJ",
"name": "Eric Lander",
"affiliation": "Broad Institute",
"email_domain": "",
"interests": [
"Biology",
"Genomics",
"Genetics",
"Bioinformatics",
"Mathematics"
],
"citedby": 552013
}
... other author results
'''
使用scrape-google-scholar-py
:的示例
from google_scholar_py import CustomGoogleScholarProfiles
import json
parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
query='blizzard',
pagination=False,
save_to_csv=False,
save_to_json=False
)
print(json.dumps(data, indent=2))
输出:
[
{
"name": "Adam Lobel",
"link": "https://scholar.google.com/citations?hl=en&user=_xwYD2sAAAAJ",
"affiliations": "Blizzard Entertainment",
"interests": [
"Gaming",
"Emotion regulation"
],
"email": "Verified email at AdamLobel.com",
"cited_by_count": 3593
}, # other results...
]
使用SerpApi:的Google Scholar Profile results API解析有机结果的示例代码
import json
from serpapi import GoogleScholarSearch
# search parameters
params = {
"api_key": "Your SerpApi API key",
"engine": "google_scholar_profiles",
"hl": "en", # language
"mauthors": "biology" # search query
}
search = GoogleScholarSearch(params)
results = search.get_dict()
# only first page results
for result in results["profiles"]:
print(json.dumps(result, indent=2))
# part of the output:
'''
{
"name": "Masatoshi Nei",
"link": "https://scholar.google.com/citations?hl=en&user=VxOmZDgAAAAJ",
"serpapi_link": "https://serpapi.com/search.json?author_id=VxOmZDgAAAAJ&engine=google_scholar_author&hl=en",
"author_id": "VxOmZDgAAAAJ",
"affiliations": "Laura Carnell Professor of Biology, Temple University",
"email": "Verified email at temple.edu",
"cited_by": 384074,
"interests": [
{
"title": "Evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolution"
},
{
"title": "Evolutionary biology",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aevolutionary_biology",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:evolutionary_biology"
},
{
"title": "Molecular evolution",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Amolecular_evolution",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:molecular_evolution"
},
{
"title": "Population genetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Apopulation_genetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:population_genetics"
},
{
"title": "Phylogenetics",
"serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aphylogenetics",
"link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:phylogenetics"
}
],
"thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=VxOmZDgAAAAJ&citpid=3"
}
... other results
'''
SerpApi上有一篇专门使用我的Python博客文章抓取历史谷歌学者结果的文章,展示了如何抓取2017-2021年历史有机、引用谷歌学者结果到CSV、SQLite。
还有一篇博客文章是关于在R中抓取谷歌学者的,如果你不是Python爱好者的话。
免责声明,我为SeprApi 工作
快速搜索显示其他人正在尝试实现这样的API,但谷歌没有提供。目前尚不清楚这是否合法,例如参见如果需要,如何从谷歌获得使用谷歌学者数据的许可?。