如何使用poi从ms word(.doc)中读取格式化文本作为HTML文本



我想阅读格式化的文本作为html文本(

boldvalue )我也想使用图像标签链接获取图像。我正在使用poi poi是否有任何选项来获取像这样的html格式的数据?

try this

HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\temp\seo\1.doc"));
        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                DocumentBuilderFactory.newInstance().newDocumentBuilder()
                        .newDocument());
        wordToHtmlConverter.processDocument(wordDocument);
        Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(out);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        out.close();
        String result = new String(out.toByteArray());
        System.out.println(result);

最新更新