kanjidic 2 xpath用于在Python上获取nanori



我目前正在一个django项目工作与kanjidic2 xml文件(http://nihongo.monash.edu/kanjidic2/index.html)。我使用xml.etree. elementtree来映射xml信息。然而,当我使用的水平。以下是kanjidic2上的条目示例:

<character id="9">
<literal>&#36898;</literal>
<codepoint>
<cp_value cp_type="ucs">9022</cp_value>
<cp_value cp_type="jis208">16-9</cp_value>
</codepoint>
<radical>
<rad_value rad_type="classical">162</rad_value>
</radical>
<misc>
<grade>9</grade>
<stroke_count>10</stroke_count>
<stroke_count>9</stroke_count>
<freq>2116</freq>
</misc>
<dic_number>
<dic_ref dr_type="nelson_c">4694</dic_ref>
<dic_ref dr_type="nelson_n">6054</dic_ref>
<dic_ref dr_type="halpern_kkd">4002</dic_ref>
<dic_ref dr_type="halpern_kkld_2ed">2774</dic_ref>
<dic_ref dr_type="heisig">2417</dic_ref>
<dic_ref dr_type="heisig6">2497</dic_ref>
<dic_ref dr_type="oneill_names">1516</dic_ref>
<dic_ref dr_type="moro" m_vol="11" m_page="0075">38901X</dic_ref>
</dic_number>
<query_code>
<q_code qc_type="skip">3-3-7</q_code>
<q_code qc_type="sh_desc">2q7.15</q_code>
<q_code qc_type="four_corner">3730.4</q_code>
<q_code qc_type="deroo">2555</q_code>
<q_code qc_type="skip" skip_misclass="stroke_diff">3-4-7</q_code>
</query_code>
<reading_meaning>
<rmgroup>
<reading r_type="pinyin">feng2</reading>
<reading r_type="korean_r">bong</reading>
<reading r_type="korean_h">&#48393;</reading>
<reading r_type="ja_on">&#12507;&#12454;</reading>
<reading r_type="ja_kun">&#12354;.&#12358;</reading>
<reading r_type="ja_kun">&#12416;&#12363;.&#12360;&#12427;</reading>
<meaning>meeting</meaning>
<meaning>tryst</meaning>
<meaning>date</meaning>
<meaning>rendezvous</meaning>
<meaning m_lang="es">encuentro</meaning>
<meaning m_lang="es">cita</meaning>
<meaning m_lang="es">encuentro casual</meaning>
<meaning m_lang="es">encontrarse</meaning>
<meaning m_lang="es">reunirse</meaning>
<meaning m_lang="es">citarse</meaning>
<meaning m_lang="es">verse por casualidad</meaning>
</rmgroup>
<nanori>&#12354;&#12356;</nanori>
<nanori>&#12362;&#12358;</nanori>
</reading_meaning>
</character>

使用以下代码将其他级别的数据放入python字典中没有问题:

for i in character:
if i.tag =='dic_number':
dictionariesDict = {}
dictionaries = root.find(".//character[@id='"+id+"']//dic_number")
for dictionary in dictionaries:
dictionariesDict[dictionary.get('dr_type')] = dictionary.text

然而,当涉及到reading_meaning标签时,我当然不知道如何在一个字典上获得纳米标记,在另一个字典中获得r_type="ja_on"属性,在另一个字典中获得reading r_type="ja_kun",在另一个字典中获得含义(理想情况下每种语言一个字典)。我已经尝试了所有类型的路径,当我打印根。find我得到了标签,但是当我循环创建字典时,我得到的只是空字典。

提前感谢您的帮助和耐心。

此xpath将同时获得reading[@r_type="ja_kun"]、第二个meaning元素和所有nanori元素
(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)

xmllint --xpath '(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)' test.xml | sed -e 's/></>n</g'
<reading r_type="ja_kun">&#x3042;.&#x3046;</reading>
<reading r_type="ja_kun">&#x3080;&#x304B;.&#x3048;&#x308B;</reading>
<meaning>tryst</meaning>
<nanori>&#x3042;&#x3044;</nanori>
<nanori>&#x304A;&#x3046;</nanori>

在bash和python上测试

>>> from lxml import etree
>>> doc = etree.parse('test.xml')
>>> doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> arr = doc.xpath('(//reading[@r_type="ja_kun"] | //meaning[2] | //nanori)')
>>> for e in arr:
...     print(e.text)
... 
あ.う
むか.える
tryst
あい
おう

使用xml.etree.ElementTree,您应该尝试"OR"中的xpath部分。一个接一个

相关内容

  • 没有找到相关文章

最新更新