我正在使用Dbpedia sparql并试图检索具有详细信息的人员列表。
SPARQL Query (Not working):
SELECT DISTINCT ?dbpedia_link ?freebase_link str(?abstract) as ?abstract str(?activeYearsStartYear) as ?activeYearsStartYear str(?alias) as ?alias
str(?birthDate) as ?birthDate str(?birthName) as ?birthName str(?birthPlace) as ?birthPlace str(?children) as ?children
str(?label) as ?label str(?occupation) as ?occupation str(?otherNames) as ?otherNames str(?residence) as ?residence
str(?shortDescription) as ?shortDescription str(?spouse) as ?spouse str(?description) as ?description str(?subject) as ?subject
str(?comment) as ?comment str(?almaMater) as ?almaMater str(?award) as ?award str(?education) as ?education str(?knownFor) as ?knownFor
str(?networth) as ?networth str(?parents) as ?parents str(?salary) as ?salary str(?viafId) as ?viafId str(?wikiPageID) as ?wikiPageID
str(?wikiPageRevisionID) as ?wikiPageRevisionID WHERE {
{
?dbpedia_link rdf:type dbpedia-owl:Person
}
OPTIONAL {?dbpedia_link dbpedia-owl:abstract ?abstract. }
OPTIONAL {?dbpedia_link dbpedia-owl:activeYearsStartYear ?activeYearsStartYear .}
OPTIONAL {?dbpedia_link dbpedia-owl:alias ?alias .}
OPTIONAL {?dbpedia_link dbpprop:birthDate ?birthDate .}
OPTIONAL {?dbpedia_link dbpprop:birthName ?birthName .}
OPTIONAL {?dbpedia_link dbpprop:birthPlace ?birthPlace .}
OPTIONAL {?dbpedia_link dbpprop:children ?children .}
OPTIONAL {?dbpedia_link rdfs:label ?label .}
OPTIONAL {?dbpedia_link dbpprop:occupation ?occupation .}
OPTIONAL {?dbpedia_link dbpprop:otherNames ?otherNames .}
OPTIONAL {?dbpedia_link dbpprop:residence ?residence .}
OPTIONAL {?dbpedia_link dbpprop:shortDescription ?shortDescription .}
OPTIONAL {?dbpedia_link dbpprop:spouse ?spouse .}
OPTIONAL {?dbpedia_link dc:description ?description .}
OPTIONAL {?dbpedia_link dcterms:subject ?subject .}
OPTIONAL {?dbpedia_link rdfs:comment ?comment .}
OPTIONAL {?dbpedia_link dbpprop:almaMater ?almaMater .}
OPTIONAL {?dbpedia_link dbpprop:awards ?award .}
OPTIONAL {?dbpedia_link dbpprop:education ?education .}
OPTIONAL {?dbpedia_link dbpprop:knownFor ?knownFor .}
OPTIONAL {?dbpedia_link dbpprop:networth ?networth .}
OPTIONAL {?dbpedia_link dbpprop:parents ?parents .}
OPTIONAL {?dbpedia_link dbpprop:salary ?salary .}
OPTIONAL {?dbpedia_link dbpedia-owl:viafId ?viafId .}
OPTIONAL {?dbpedia_link dbpedia-owl:wikiPageID ?wikiPageID .}
OPTIONAL {?dbpedia_link dbpedia-owl:wikiPageRevisionID ?wikiPageRevisionID .}
OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link
FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
OPTIONAL {?dbpedia_link dcterms:subject ?sub .}
}LIMIT 2 Offset 5
我将限制设置为2,并将偏移量设置为5。它给出超时错误。不知道为什么要?
但是当我从查询中删除一半字段+可选语句时,它会给出结果。
SPARQL查询(working):SELECT DISTINCT ?dbpedia_link str(?abstract) as ?abstract str(?activeYearsStartYear) as ?activeYearsStartYear str(?alias) as ?alias
str(?birthDate) as ?birthDate str(?birthName) as ?birthName str(?birthPlace) as ?birthPlace str(?children) as ?children
str(?label) as ?label str(?occupation) as ?occupation str(?otherNames) as ?otherNames str(?residence) as ?residence
WHERE {
{
?dbpedia_link rdf:type dbpedia-owl:Person
}
OPTIONAL {?dbpedia_link dbpedia-owl:abstract ?abstract. }
OPTIONAL {?dbpedia_link dbpedia-owl:activeYearsStartYear ?activeYearsStartYear .}
OPTIONAL {?dbpedia_link dbpedia-owl:alias ?alias .}
OPTIONAL {?dbpedia_link dbpprop:birthDate ?birthDate .}
OPTIONAL {?dbpedia_link dbpprop:birthName ?birthName .}
OPTIONAL {?dbpedia_link dbpprop:birthPlace ?birthPlace .}
OPTIONAL {?dbpedia_link dbpprop:children ?children .}
OPTIONAL {?dbpedia_link rdfs:label ?label .}
OPTIONAL {?dbpedia_link dbpprop:occupation ?occupation .}
OPTIONAL {?dbpedia_link dbpprop:otherNames ?otherNames .}
OPTIONAL {?dbpedia_link dbpprop:residence ?residence .}
}LIMIT 2 offset 5
但是不知道为什么不工作与所有字段。
是否有任何限制字段在Dbpedia SPARQL?
这既是一种限制,也是一种特性…
如果你在http://dbpedia.org/sparql上运行你的第一个查询并阅读回复,它应该是
Virtuoso 42000 Error The estimated execution time 4626142 (sec) exceeds the limit of 240 (sec).
这实际上告诉你,你的查询是相当复杂的。查询规划器估计运行查询需要4626142秒(~54天)。由于DBpedia是一个免费的尽力而为的服务,他们不运行这样的查询,以便能够为尽可能多的人提供良好的服务。
正如您所意识到的,通过提供更少的OPTIONAL子句,您的查询变得不那么复杂。您可能没有意识到,您正在请求所有可选子句中变量的所有满足值的交叉连接(笛卡尔积)。如果绑定更少的变量,就会有更少的值组合。
如果您只对每个变量的一个值感兴趣,您可能想看看SAMPLE关键字。
有这么多变量,它们都是可选的,看起来您已经需要对结果进行一些后期处理了。因此,我建议您实际上只是开始通过values询问person,以及该属性列表中的任何属性。例如:
select distinct ?s ?p ?o {
values ?p { dbpedia-owl:abstract
dbpedia-owl:abstract
dbpedia-owl:activeYearsStartYear
dbpedia-owl:alias
dbpprop:birthDate
dbpprop:birthName
dbpprop:birthPlace
dbpprop:children
rdfs:label
dbpprop:occupation
dbpprop:otherNames
dbpprop:residence }
?s a dbpedia-owl:Person ; ?p ?o .
}
order by ?s ?p
limit 100
offset 50
SPARQL结果它有更多的行,因为它每个属性有一行,但它不会超时。通过按?s然后按?p排序,行最终按人分组,并且具有可预测的属性顺序,因此后处理不应该那么困难。实际上,您甚至可以在这里使用optional,这样每个人总是拥有相同的行数,这将使它非常容易(但我还没有测试过):
select ?s ?p ?o {
values ?p { #-- ...
}
?s a dbpedia-owl:Person .
optional { ?s ?p ?o }
}
order by ?s ?p