如何在python中将wiki数据QID转换为实体,反之亦然



我有一个从文本中提取的实体列表。

例如这里是我的文本

"text": "Anarchism is an anti-authoritarian political and social philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions."

这里是从文本中提取的实体。(对于每一对,第一个是实体,第二个是它在文本中的提及。

"anchored_et": [["Anti-authoritarianism", "anti-authoritarian"], ["Political philosophy", "political"], ["Social philosophy", "social philosophy"], ["Hierarchy", "hierarchies"], ["Workers' self-management", "self-managed"], ["Self-governance", "self-governed"], ["cooperative", "cooperative"]]

除此之外,我还有一个三元组列表,它们的主题和对象都是wiki数据QID格式的。

所以我需要首先将提取的实体转换为它们的QID,然后找到它们的主题是这样的三元组,在找到这些三元组之后,我需要将对象QID转换为其实体。

所以我需要将wiki数据QID转换为实体,反之亦然。

我的问题是我怎样才能做到。

下面是我为自己编写的两个函数。

使用SPARQLWrapper from pypi.

from SPARQLWrapper import SPARQLWrapper
import requests
def wikidata_id_to_enwiki_title(Qid):
try:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setReturnFormat('json')
sparql.setQuery('SELECT DISTINCT * WHERE { wd:'+Qid+' rdfs:label ?label . FILTER (langMatches( lang(?label), "EN" ) ) }') # the previous query as a literal string
data=sparql.query().convert()
results=data["results"]["bindings"]
results=[res["label"]["value"] for res in results]
return results
except:
return [ ]
def enwiki_title_to_wikidata_id(title: str) -> str:
try:
protocol = 'https'
base_url = 'en.wikipedia.org/w/api.php'
params = f'action=query&prop=pageprops&format=json&titles={title}'
url = f'{protocol}://{base_url}?{params}'
response = requests.get(url)
json = response.json()
for pages in json['query']['pages'].values():
wikidata_id = pages['pageprops']['wikibase_item']
return wikidata_id
except:
return None

最新更新