为什么我没有获得 XML 标记中的文本?- Python 元素树

如何读取<context>...</context>标签中的所有文本？那么<context >标签中的<head>...<head>标签呢？

我有一个看起来像这样的XML文件：

<corpus lang="english">
    <lexelt item="coach.n">
        <instance id="1">
            <context>I'll buy a train or <head>coach</head> ticket.</context>
        </instance>
        <instance id="2">
            <context>A branch line train took us to Aubagne where a <head>coach</head> picked us up for the journey up to the camp.</context>
        </instance>
    </lexelt>
</corpus>

但是当我运行代码以读取 .... 中的 XML 文本时，我只会在到达标记之前获取文本。

import xml.etree.ElementTree as et    
inputfile = "./coach.data"    
root = et.parse(open(inputfile)).getroot()
instances = []
for corpus in root:
    for lexelt in corpus:
      for instance in lexelt:
        instances.append(instance.text)
j=1
for i in instances:
    print "instance " + j
    print "left: " + i
    print "n"  
    j+=1

现在我只是得到左侧：

instance 1
left: I'll buy a train or 
instance 2
left: A branch line train took us to Aubagne where a

输出还需要上下文和头部的右侧，它应该是：

instance 1
left: I'll buy a train or 
head: coach
right:   ticket.
instance 2
left: A branch line train took us to Aubagne where a 
head: coach
right:  picked us up for the journey up to the camp.

首先，你的代码有一个错误。 for corpus in root不是必需的，您的根已经corpus了。

您可能打算做的是：

for lexelt in root:
  for instance in lexelt:
    for context in instance:
      contexts.append(context.text)

现在，关于您的问题 - 在for context in instance块中，您可以访问所需的其他两个字符串：

可以通过访问context.find('head').text来访问head文本
可以通过访问context.find('head').tail来阅读head元素右侧的文本根据Python etree文档：

tail属性可用于保存与元素。此属性通常是一个字符串，但可以是任何特定于应用程序的对象。如果元素是从 XML 创建的文件该属性将包含在元素结束后找到的任何文本标记中，并在下一个标记之前。

在 ElementTree 中，你必须考虑子节点的 tail 属性。在你的情况下，语料库也是根。

   import xml.etree.ElementTree as et       inputfile = "./coach.data"       corpus = et.parse（open（inputfile））.getroot（）    def getalltext（elem）：        返回 elem.text + ''.join（[getalltext（child） + child.tail for child in elem]）    实例 = []    对于语料库中的 Lexelt：        例如在Lexelt中：            instance.append（getalltext（instance））j=1    对于 i 在实例中：        打印"实例"+ j        打印"左：" + i        打印""         j+=1

相关内容

最新更新

热门标签：