使用iText将包含表单的PDF转换为仅包含文本的PDF(保留数据)

我有多个pdf，它们使用acroforms和pdfbox填充了多个记录（a.pdf、b.pdf、c[0-9].pdf、d[0-9]pdf、ez.pdf）
生成的文件（aflat.pdf、bflat.pdf、c[0-9]flat.pdf、d[0-9]flat.pdf、ezflat.pdf）应该删除它们的表单（字典和adobe使用的任何东西），但填充为原始文本的字段保存在pdf上（setReadOnly不是我想要的！）

PdfStamper只能删除字段而不保存其内容，但我发现了一些对PdfContentByte的引用作为保存内容的一种方式。遗憾的是，文档太简短了，无法理解我应该如何做到这一点。

作为最后的手段，我可以使用FieldPosition直接在PDF上书写。有人遇到过这样的问题吗？我该如何解决？

更新：保存一页b.pdf会生成有效的bfilted.pdf，但会生成空白的bflatened.pdf。保存整个文档解决了问题。

    populateB();
    try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
        //importing the page will corrupt the fields
        /*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
        /*wrong approach*/doc.save(stream);
        //save the whole document instead
        pdfDocuments.get(0).save(stream);//<---right approach
    }
    try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
        PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
        stamper.setFormFlattening(true);
        stamper.close();
    }

使用PdfStamper.setFormFlattening(true)去掉字段并将其作为内容写入。

使用acroforms 时始终使用整个页面

    populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
    //importing the page will corrupt the fields
    doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
    doc.save(stream); 
    //save the whole document instead
    pdfDocuments.get(0).save(stream);
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
    PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
    stamper.setFormFlattening(true);
    stamper.close();
}

相关内容

最新更新

热门标签：