XML在标记周围获取文本



我有一个XML下面的模式,我想检索周围的文本(左和右)标记如下(使用JAVA + DOM4j)

   <article>
    <article-meta></article-meta>
    <body>
     <p> 
     Extensible Markup Language (XML) is a markup language that defines a set of
     rules for encoding documents in a format that is both human-readable and machine-
     readable <ref id = 1>1</ref>. It is defined in the XML 1.0 Specification produced
      by the W3C, and several other related specifications
      </p>
      <p>
       Many application programming interfaces (APIs) have been developed to aid 
      software developers with processing XML <ref id = 2>2</ref>. data, and several schema 
       systems exist to aid in the definition of XML-based languages.
      </p>
    </body>
    </article>

我想检索周围标记的文本。例如,这个XML的输出是

 <ref id = 1>1</ref>

left:人类可读和机器可读读

right:在XML 1.0规范中定义

Try

import java.util.List;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
public class TestDom4j {
    public static Document getDocument(final String xmlFileName) {
        Document document = null;
        SAXReader reader = new SAXReader();
        try {
            document = reader.read(xmlFileName);
        } catch (DocumentException e) {
            e.printStackTrace();
        }
        return document;
    }
    /**
     * @param args
     */
    public static void main(String[] args) {
        String xmlFileName = "data.xml";
        String xPath = "//article/body/p";
        Document document = getDocument(xmlFileName);
        List<Node> nodes = document.selectNodes(xPath);
        for (Node node : nodes) {
            String nodeXml = node.asXML();
            System.out.println("Left  >> " + nodeXml.substring(3, nodeXml.indexOf("<ref")).trim());
            System.out.println("Right  >> " + nodeXml.substring(nodeXml.indexOf("</ref>") + 6, nodeXml.length() - 4).trim());
        }
    }
}

最新更新