我正在尝试与DBLP的RDF转储一起工作,该DBLP可从RDF中的DBLP获得。我试图使用jena的rdfcat将该文件转换为乌龟格式:
rdfcat -x dblp-2006-02-06.rdf -out t > dblp.ttl
不幸的是,这将中止以下错误消息:
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
Maybe a striping error.
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error
(ErrorHandlerFactory.java:128)
at org.apache.jena.riot.lang.LangRDFXML$ErrorHandlerBridge.error(LangRDF
XML.java:246)
…
据我从另一个问题中学到的,什么是条带错误?,a 条纹错误在rdf/xml解析中发生时,当层次结构XML结构不符合RDF/XML/XML时,奇数。现在,查看该文件,文件的各个部分看起来像:
<rdf:Description rdf:about="http://www.informatik.uni-trier.de/~ley/db/journals/ac/ac40.html#YousifTD95"><dc:identifier>journals/ac/YousifTD95</dc:identifier><dc:date>2002-01-03</dc:date><rdf:type rdf:resource="http://sw.deri.org/~aharth/2004/07/dblp/dblp.owl#Article"/>
<dc:creator><foaf:Person rdf:nodeID="MazinSYousif"><foaf:name>Mazin S. Yousif</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="MatthewThazhuthaveetil"><foaf:name>Matthew Thazhuthaveetil</foaf:name></foaf:Person></dc:creator>
<dc:creator><foaf:Person rdf:nodeID="ChitaRDas"><foaf:name>Chita R. Das</foaf:name></foaf:Person></dc:creator>
<dc:title rdf:parseType="Literal">Cache Coherence in Multiprocessors: A Survey.</dc:title>
<pages>127-179</pages>
<year>1995</year>
<volume>40</volume>
<journal>Advances in Computers</journal>
</rdf:Description>
根据Nano的说法,第378行似乎是 Matthew Thazhuthaveetil 。但是,不知何故,我看不到该行在结构上有问题的位置(特别是在将该行与周围的其他行进行比较时)。那里真的有一个结构性问题(如果是这样,那是什么),或者错误消息误导了?
只是用apache jena 2.11.1自己尝试了一下,这很好。您是否尝试过"暴动 - valate"?
错误很好奇:
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 378, col:
147] {E202} Expecting XML start or end element(s). String data "
????????????????????????????????????????????????????????????????????????????????
??????????????????????????????????????????????????????????????????" not allowed.
Maybe a striping error.
它没有显示可打印的字符,这是神秘的。
错误仅表示RDF在属性标签之外包含非空格字符。这表明它可能具有隐形垃圾,也许在</dc:creator>
之后落后?
我看不到那样的东西,所以在某个地方确实感觉像是一个io错误。