加载错误PDF时捕获PDFBox警告



使用PDFBox加载PDF时,如果PDF错误,则会收到日志级别警告:

PDDocument doc = PDDocument.load (new File (filename));

例如,这可能导致控制台上出现以下输出:

Dez 08, 2020 9:14:41 AM org.apache.pdfbox.pdfparser.COSParser validateStreamLength 
WARNING: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 3141, length: 1674, expected end position: 4815

显然,pdf在内容流中有一些错误,但它确实加载到了doc中。但是,用PDFBox以编程方式捕捉这些警告可能吗?是否存在一些属性,告诉您文档加载后的警告?

我试过PDFBox Preflight,但它检查PDF/A的合规性,这会导致更多的消息。

尝试解析器的非宽松模式。此代码来自ShowSignature.java示例:

RandomAccessBufferedFileInputStream raFile = new RandomAccessBufferedFileInputStream(file);
// If your files are not too large, you can also download the PDF into a byte array
// with IOUtils.toByteArray() and pass a RandomAccessBuffer() object to the
// PDFParser constructor.
PDFParser parser = new PDFParser(raFile);
parser.setLenient(false);
parser.parse();
PDDocument document = parser.getPDDocument();

最新更新