Docx4j 在将 html 文档转换为 docx 时遇到某些样式问题



将此 html 文件转换为文档后,我对某些样式有问题。

<html>
<head>
<style>
div,p{ 
    background-color: #ff0000;
    padding: 100px;
    border: 10px solid #000;
    text-align: justify;
    margin-bottom: 50px;
    text-indent: 50px;
}
</style>
</head>
<body>
    <div>test test test <br/>test test test <br/>test test test</div>
    <p>test test test <br/>test test test <br/>test test test</p>
    <p>test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test test </p>
</body>
</html>

使用以下单元测试

@Test
public void testConvertXhtml3() throws Exception 
{
        String inputfilepath = "/Users/kyv/Documents/test.html";
        // Create an empty docx package
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();        
        XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        // Convert the XHTML, and add it into the empty docx we made
        wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(new File(inputfilepath), null) );

        wordMLPackage.save(new java.io.File("/Users/kyv/Documents/test.docx") );
  }

在控制台中,我得到很多"如何处理:..."日志的一部分

Attempting to load: docx4j.properties
Using paper size: A4
Landscape orientation: false
Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml on part /

java.vendor=Oracle Corporation
java.version=1.7.0_55
jar:file:/Users/kvn/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar!/META-INF/MANIFEST.MF
Implementation-Title : JAXB Reference Implementation 
Implementation-Version : 2.2.3
Class-Path : jaxb-api.jar activation.jar jsr173_1.0_api.jar jaxb1-impl.jar
Manifest-Version : 1.0
Specification-Vendor : Oracle Corporation
Created-By : 1.5.0_22-b03 (Sun Microsystems Inc.)
Ant-Version : Apache Ant 1.7.1
Implementation-Vendor : Oracle Corporation
Implementation-Vendor-Id : com.sun
Specification-Title : Java Architecture for XML Binding
Specification-Version : 2.2.2
Extension-Name : com.sun.xml.bind
Build-Id : hudson-jaxb-ri-2.2.3-3
Found JAXB reference implementation in jar:file:/Users/kushniry/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar!/META-INF/MANIFEST.MF
Implementation-Version : 2.2.3-hudson-jaxb-ri-2.2.3-3-
Attempting to load: org/docx4j/wml/jaxb.properties
Not using MOXy, since no resource: org/docx4j/wml/jaxb.properties
No MOXy JAXB config found; assume not intended..
org/docx4j/wml/jaxb.properties not found via classloader.
name: com.sun.xml.internal.bind.namespacePrefixMapper value: org.docx4j.jaxb.NamespacePrefixMapperSunInternal@2a3d4350 .. trying RI.
Using NamespacePrefixMapper, which is suitable for the JAXB RI
Using JAXB Reference Implementation
Not using MOXy; using com.sun.xml.bind.v2.runtime.JAXBContextImpl
.. other contexts loaded ..
Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml on part /word/document.xml

Using paper size: A4
Landscape orientation: false
Set contentType application/vnd.openxmlformats-package.relationships+xml on part /_rels/.rels

setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart
setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart
Registered rels
adding part with proposed name: /word/document.xml
Relativising target /word/document.xml against source /
Result word/document.xml
rel exists: false

Loading part /word/document.xml
put part /word/document.xml
setPackage called for org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
Set shortcut for mainDoc
shortcut was set
Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml on part /word/styles.xml

docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart.DefaultStyles resolved to org/docx4j/openpackaging/parts/WordprocessingML/styles.xml
Attempting to load: org/docx4j/openpackaging/parts/WordprocessingML/styles.xml
For org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart, unmarshall via binder
Oracle Corporation
1.7.0_55
Using com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
Using com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
info: com.sun.xml.bind.v2.runtime.BinderImpl

Set contentType application/vnd.openxmlformats-package.relationships+xml on part /word/_rels/document.xml.rels

setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart
setPackage called for org.docx4j.openpackaging.parts.relationships.RelationshipsPart
Registered rels
adding part with proposed name: /word/styles.xml
Relativising target /word/styles.xml against source /word/document.xml
Result styles.xml
rel exists: false

Loading part /word/styles.xml
put part /word/styles.xml
setPackage called for org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart
shortcut was set
xpath implementation: org.apache.xpath.jaxp.XPathFactoryImpl
Set contentType application/vnd.openxmlformats-package.core-properties+xml on part /docProps/core.xml

adding part with proposed name: /docProps/core.xml
Relativising target /docProps/core.xml against source /
Result docProps/core.xml
rel exists: false

Loading part /docProps/core.xml
put part /docProps/core.xml
setPackage called for org.docx4j.openpackaging.parts.DocPropsCorePart
Set shortcut for docPropsCorePart
shortcut was set
Set contentType application/vnd.openxmlformats-officedocument.extended-properties+xml on part /docProps/app.xml

adding part with proposed name: /docProps/app.xml
Relativising target /docProps/app.xml against source /
Result docProps/app.xml
rel exists: false

Loading part /docProps/app.xml
put part /docProps/app.xml
setPackage called for org.docx4j.openpackaging.parts.DocPropsExtendedPart
Set shortcut for docPropsExtendedPart
shortcut was set
Set contentType application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml on part /word/numbering.xml


adding part with proposed name: /word/numbering.xml
Relativising target /word/numbering.xml against source /word/document.xml
Result numbering.xml
rel exists: false

Loading part /word/numbering.xml
put part /word/numbering.xml
setPackage called for org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart
shortcut was set
docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.DefaultNumbering resolved to org/docx4j/openpackaging/parts/WordprocessingML/numbering.xml
Attempting to load: org/docx4j/openpackaging/parts/WordprocessingML/numbering.xml
For org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart, unmarshall via binder
info: com.sun.xml.bind.v2.runtime.BinderImpl
tableFormatting: CLASS_PLUS_OTHER
paragraphFormatting: CLASS_PLUS_OTHER
runFormatting: CLASS_PLUS_OTHER
Attempting to load: docx4j-ImportXHTML.properties
Preparing StyleTree
Style with name Normal, id 'Normal' is default paragraph style
Set virtual style, id 'DocDefaults', name 'DocDefaults'
setProperty: com.sun.xml.bind.namespacePrefixMapper
<w:style w:type="paragraph" w:styleId="DocDefaults" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:ns21="urn:schemas-microsoft-com:office:powerpoint" xmlns:ns23="http://schemas.microsoft.com/office/2006/coverPageProps" xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:odx="http://opendope.org/xpaths" xmlns:odgm="http://opendope.org/SmartArt/DataHierarchy" xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram" xmlns:ns17="urn:schemas-microsoft-com:office:excel" xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:odi="http://opendope.org/components" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:ns9="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:ns32="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:ns30="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns:ns12="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:ns31="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:odq="http://opendope.org/questions" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:odc="http://opendope.org/conditions" xmlns:oda="http://opendope.org/answers">
    <w:name w:val="DocDefaults"/>
    <w:pPr>
        <w:spacing w:after="200" w:line="276" w:lineRule="auto"/>
    </w:pPr>
    <w:rPr>
        <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:eastAsiaTheme="minorHAnsi" w:cstheme="minorBidi"/>
        <w:sz w:val="22"/>
        <w:szCs w:val="22"/>
        <w:lang w:val="en-US" w:eastAsia="en-US" w:bidi="ar-SA"/>
    </w:rPr>
</w:style>
Style with name Default Paragraph Font, id 'DefaultParagraphFont' is default character style
getting children of java.util.ArrayList

No numPr.. 
200 twips -> 3.5250988mm (0.14inches)
 /* TABLE STYLES */ 
 /* PARAGRAPH STYLES */ 
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
 /* CHARACTER STYLES */ 
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: SAX XMLReader in use (parser): org.apache.xerces.parsers.SAXParser
org.docx4j.org.xhtmlrenderer.load INFO:: Loaded document in ~91ms
org.docx4j.org.xhtmlrenderer.load INFO:: TIME: parse stylesheets  170ms
org.docx4j.org.xhtmlrenderer.match INFO:: media = print
org.docx4j.org.xhtmlrenderer.match INFO:: Matcher created with 136 selectors
org.docx4j.org.xhtmlrenderer.render.BlockBox
BB<html color: #000000; background-color: transparent; background-image: none; background-repeat: repeat; background-attachment: scroll; background-position: [0%, 0%]; background-size: [auto, auto]; border-collapse: separate; -fs-border-spacing-horizontal: 0; -fs-border-spacing-vertical: 0; -fs-font-metric-src: none; -fs-keep-with-inline: auto; -fs-page-width: auto; -fs-page-height: auto; -fs-page-sequence: auto; -fs-pdf-font-embed: auto; -fs-pdf-font-encoding: Cp1252; -fs-page-orientation: auto; -fs-table-paginate: auto; -fs-text-decoration-extent: line; bottom: auto; caption-side: top; clear: none; ; content: normal; counter-increment: none; counter-reset: none; cursor: auto; ; display: block; empty-cells: show; float: none; font-style: normal; font-variant: normal; font-weight: normal; font-size: medium; line-height: normal; font-family: serif; -fs-table-cell-colspan: 1; -fs-table-cell-rowspan: 1; height: auto; left: auto; letter-spacing: normal; list-style-type: disc; list-style-position: outside; list-style-image: none; max-height: none; max-width: none; min-height: 0; min-width: 0; orphans: 2; ; ; ; overflow: visible; page: auto; page-break-after: auto; page-break-before: auto; page-break-inside: auto; position: static; ; right: auto; src: none; table-layout: auto; text-align: left; text-decoration: none; text-indent: 0; text-transform: none; top: auto; ; vertical-align: baseline; visibility: visible; white-space: normal; word-wrap: normal; widows: 2; width: auto; word-spacing: normal; z-index: auto; border-top-color: #000000; border-right-color: #000000; border-bottom-color: #000000; border-left-color: #000000; border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-top-width: 2px; border-right-width: 2px; border-bottom-width: 2px; border-left-width: 2px; margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; padding-top: 0; padding-right: 0; padding-bottom: 0; padding-left: 0; 
block
default handling for html
How to handle: border-bottom-width?
How to handle: text-indent?
How to handle: cursor?
How to handle: visibility?
How to handle: border-right-style?
How to handle: font-weight?
How to handle: float?
How to handle: border-bottom-style?
How to handle: height?
How to handle: background-size?
How to handle: page?
How to handle: border-right-color?
How to handle: border-right-width?
How to handle: white-space?
How to handle: right?
How to handle: background-image?
How to handle: background-position?
How to handle: padding-right?
How to handle: widows?
How to handle: max-height?
How to handle: width?
How to handle: display?
How to handle: min-height?
How to handle: padding-bottom?
How to handle: content?
How to handle: border-left-color?
How to handle: border-top-color?
How to handle: background-attachment?
How to handle: border-left-style?
How to handle: overflow?
valueType PRIMITIVE for margin-left
PrimitiveType: 1
margin-left: 0.0
How to handle: bottom?
How to handle: page-break-inside?
How to handle: margin-top?
How to handle: empty-cells?
How to handle: caption-side?
How to handle: background-repeat?
How to handle: list-style-position?
How to handle: position?
How to handle: border-top-style?
How to handle: counter-reset?
valueType PRIMITIVE for text-align
PrimitiveType: 21
How to handle: counter-increment?
valueType PRIMITIVE for page-break-after
PrimitiveType: 21
How to handle: clear?
How to handle: margin-right?
valueType PRIMITIVE for line-height
PrimitiveType: 21
How to handle: border-collapse?
How to handle: font-size?
How to handle: left?
How to handle: word-wrap?
How to handle: src?
How to handle: border-left-width?
How to handle: word-spacing?
How to handle: top?
How to handle: padding-left?
How to handle: padding-top?
How to handle: list-style-type?
How to handle: letter-spacing?
How to handle: font-variant?
...............

..............
How to handle: font-family?
valueType PRIMITIVE for page-break-before
PrimitiveType: 21
No mapping for: 'serif'
.. processed child org.docx4j.org.xhtmlrenderer.render.InlineBox
Done processing children of org.docx4j.org.xhtmlrenderer.render.BlockBox
.. processed child org.docx4j.org.xhtmlrenderer.render.BlockBox
Done processing children of org.docx4j.org.xhtmlrenderer.render.BlockBox
.. processed child org.docx4j.org.xhtmlrenderer.render.BlockBox
Done processing children of org.docx4j.org.xhtmlrenderer.render.BlockBox
sourcePartStore undefined
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.contenttype.ContentTypeManager ...
marshalling /_rels/.rels
name: com.sun.xml.internal.bind.namespacePrefixMapper value: org.docx4j.jaxb.NamespacePrefixMapperRelationshipsPartSunInternal@7bf8dc3c .. trying RI.
Using NamespacePrefixMapperRelationshipsPart, which is suitable for the JAXB RI
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
For Relationship Id=rId1 Source is /, Target is word/document.xml
Getting part /word/document.xml
org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart

.. saving 
marshalling /word/document.xml
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
marshalling /word/_rels/document.xml.rels
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
For Relationship Id=rId1 Source is /word/document.xml, Target is styles.xml
Getting part /word/styles.xml
org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart

.. saving 
marshalling /word/styles.xml
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart
For Relationship Id=rId2 Source is /word/document.xml, Target is numbering.xml
Getting part /word/numbering.xml
org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart

.. saving 
marshalling /word/numbering.xml
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart
For Relationship Id=rId2 Source is /, Target is docProps/core.xml
Getting part /docProps/core.xml
org.docx4j.openpackaging.parts.DocPropsCorePart

.. saving 
marshalling /docProps/core.xml
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.DocPropsCorePart
For Relationship Id=rId3 Source is /, Target is docProps/app.xml
Getting part /docProps/app.xml
org.docx4j.openpackaging.parts.DocPropsExtendedPart

.. saving 
marshalling /docProps/app.xml
setProperty: com.sun.xml.bind.namespacePrefixMapper
marshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
...Done!

我有什么办法可以解决这个问题以将文档转换为适当的样式?我的设置

docx4j.AppVersion=3.3

 <dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j</artifactId>
    <version>3.2.1</version>
 </dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-ImportXHTML</artifactId>
    <version>3.2.1</version>
</dependency>
这是

PropertyFactory中的DEBUG级日志记录,旨在告诉开发人员当前忽略/不支持哪些CSS属性。

另外,请注意,如果目标 docx 中的样式与@class值匹配,则可以使用它们。 这是在段落、运行和表级别单独配置的。

相关内容

  • 没有找到相关文章

最新更新