使用SAX Parser解析XML数据并将其保存到mysqllocalhost(JAVA)时速度较慢

我正在用JAVA为当前有问题的程序编程。

我必须解析一个1.60 GB大小的.rdf大文件（XML格式），然后将解析后的数据插入mysqllocalhost服务器。

在谷歌搜索之后，我决定在代码中使用SAX解析器。许多网站鼓励使用SAX解析器而不是DOM解析器，说SAX解析器比DOM解析器快得多。

然而，当我执行使用SAX解析器的代码时，我发现我的程序执行得太慢了。我实验室的一位大四学生告诉我，可能发生了低速问题来自文件I/O进程。

在"javax.xml.parsers.SXParser.class"的代码中，"InputStream"用于文件输入，与此相比，这可能会使作业速度变慢到使用"Scanner"类或"BufferedReader"类。

我的问题是。。1.SAX解析器适合解析大型xml文档吗？

My program took 10 minutes to parse a 14MB sample file and insert data
to mysql localhost.
Actually, another senior in my lab who made a similar program 
as mine but using DOM parser parses the 1.60GB xml file and saves data
in an hour.

如何使用"BufferedReader"而不是"InputStream"，同时使用SAX解析器库

这是我向stackoverflow提出的第一个问题，所以任何形式的建议都将是感激和有益的。感谢您的阅读。

收到初始反馈后添加部分我应该上传我的代码来澄清我的问题，我为此道歉。

package xml_parse;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class Readxml extends DefaultHandler {
    Connection con = null;
    String[] chunk; // to check /A/, /B/, /C/ kind of stuff.
    public Readxml() throws SQLException {
        // connect to local mysql database
        con = DriverManager.getConnection("jdbc:mysql://localhost/lab_first",
                "root", "2030kimm!");
    }
    public void getXml() {
        try {
            // obtain and configure a SAX based parser
            SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
            // obtain object for SAX parser
            SAXParser saxParser = saxParserFactory.newSAXParser();
            // default handler for SAX handler class
            // all three methods are written in handler's body
            DefaultHandler default_handler = new DefaultHandler() {
                String topic_gate = "close", category_id_gate = "close",
                        new_topic_id, new_catid, link_url;
                java.sql.Statement st = con.createStatement();
                public void startElement(String uri, String localName,
                        String qName, Attributes attributes)
                        throws SAXException {
                    if (qName.equals("Topic")) {
                        topic_gate = "open";
                        new_topic_id = attributes.getValue(0);
                        // apostrophe escape in SQL query
                        new_topic_id = new_topic_id.replace("'", "''");
                        if (new_topic_id.contains("International"))
                            topic_gate = "close";
                        if (new_topic_id.equals("") == false) {
                            chunk = new_topic_id.split("/");
                            for (int i = 0; i < chunk.length - 1; i++)
                                if (chunk[i].length() == 1) {
                                    topic_gate = "close";
                                    break;
                                }
                        }
                        if (new_topic_id.startsWith("Top/"))
                            new_topic_id.replace("Top/", "");
                    }
                    if (topic_gate.equals("open") && qName.equals("catid"))
                        category_id_gate = "open";
                    // add each new link to table "links" (MySQL)
                    if (topic_gate.equals("open") && qName.contains("link")) {
                        link_url = attributes.getValue(0);
                        link_url = link_url.replace("'", "''"); // take care of
                                                                // apostrophe
                                                                // escape
                        String insert_links_command = "insert into links(link_url, catid) values('"
                                + link_url + "', " + new_catid + ");";
                        try {
                            st.executeUpdate(insert_links_command);
                        } catch (SQLException e) {
                            // TODO Auto-generated catch block
                            e.printStackTrace();
                        }
                    }
                }
                public void characters(char ch[], int start, int length)
                        throws SAXException {
                    if (category_id_gate.equals("open")) {
                        new_catid = new String(ch, start, length);
                        // add new row to table "Topics" (MySQL)
                        String insert_topics_command = "insert into topics(topic_id, catid) values('"
                                + new_topic_id + "', " + new_catid + ");";
                        try {
                            st.executeUpdate(insert_topics_command);
                        } catch (SQLException e) {
                            // TODO Auto-generated catch block
                            e.printStackTrace();
                        }
                    }
                }
                public void endElement(String uri, String localName,
                        String qName) throws SAXException {
                    if (qName.equals("Topic"))
                        topic_gate = "close";
                    if (qName.equals("catid"))
                        category_id_gate = "close";
                }
            };
            // BufferedInputStream!!
            String filepath = null;
            BufferedInputStream buffered_input = null;
            /*
             * // Content filepath =
             * "C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8";
             * buffered_input = new BufferedInputStream(new FileInputStream(
             * filepath)); saxParser.parse(buffered_input, default_handler);
             * 
             * // Adult filepath =
             * "C:/Users/Kim/Desktop/2016여름/ad-content.rdf.u8"; buffered_input =
             * new BufferedInputStream(new FileInputStream( filepath));
             * saxParser.parse(buffered_input, default_handler);
             */
            // Kids-and-Teens
            filepath = "C:/Users/Kim/Desktop/2016여름/kt-content.rdf.u8";
            buffered_input = new BufferedInputStream(new FileInputStream(
                    filepath));
            saxParser.parse(buffered_input, default_handler);
            System.out.println("Finished.");
        } catch (SQLException sqex) {
            System.out.println("SQLException: " + sqex.getMessage());
            System.out.println("SQLState: " + sqex.getSQLState());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

这是我程序的全部代码。。

我昨天的原始代码以如下方式尝试文件I/O（而不是使用"BufferedInputStream"）

saxParser.parse("file:///C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8",
             default_handler);

在我使用后，我希望我的程序速度会有所提高"BufferedInputStream"，但速度丝毫没有提高。我很难找出造成速度问题的瓶颈。非常感谢。

代码中读取的rdf文件大小约为14MB，大约需要我的计算机需要11分钟才能执行此代码。

SAX解析器适合解析大型xml文档吗？

是的，显然SAX和StAX解析器是解析大型XML文档的最佳选择，因为它们是低内存和CPU的消费者，而DOM解析器将所有内容加载到内存中，这显然不是正确的选择。

响应更新：对我来说，关于你的代码，你的缓慢问题更多地与你如何在数据库中存储数据有关。您当前的代码以自动提交模式执行查询，而您应该使用事务模式以获得更好的性能，因为您有很多数据要插入，请阅读本文以更好地理解。为了减少数据库和应用程序之间的往返，您还应该考虑使用像本示例中那样的批更新。

使用SAX解析器，您应该能够在没有太多困难的情况下实现1Gb/分钟的解析速度。如果解析14Mb需要10分钟的时间，那么要么你做错了什么，要么你把时间花在了SAX解析之外的事情上（例如数据库更新）。

您可以继续使用SAX解析器，并使用BufferedInputStream而不是BufferedReader（因为您不需要猜测XML的字符集编码）。

一般来说，对于XML，可能会读取额外的文件：DTD等等。例如，（X）HTML有大量的命名实体。使用XML目录在本地拥有这些远程文件会有很大帮助。

也许你可以关闭验证。

此外，您还可以使用gzip压缩来比较网络流量与计算能力。通过设置头和检查头，GZipInputStream按大小写可能更高效（或不高效）。

相关内容

最新更新

热门标签：