通过 PDFBox 程序运行多个 PDF

目前我正在尝试在Eclipse中使用PDFBox通过文本阅读器在文件夹中运行多个PDF文件，该阅读器将提取某些术语并将它们输出到文本文件中，然后将其转换为Excel工作表。目前我有该程序，它可以在单个PDF文件上正常工作：

public static void main(String args[]( 抛出 IOException {

//Loading an existing document
File file = new File("ADE_acetylfuranoside_120319_pfister.pdf");
PDDocument document = PDDocument.load(file);
//Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
//Retrieving text from PDF document
String text = pdfStripper.getText(document);

//..."提取文本的实际代码"...

PrintStream o = new PrintStream(new File("output.txt"));
PrintStream console = System.out; 
System.setOut(o); 
System.out.println(finalSheet);

我的问题是我想在 Eclipse 上通过这个程序在一个文件夹中运行 500 个 PDF，而不是单独输入每个 PDF 的名称。我还希望它输出如下：

名称 1、数字 1、ID1 名称 2、数字 2、ID2

但我认为现在的编写方式只会覆盖第一行，如果我通过它运行多个 PDF。

感谢您的帮助！

对于第一部分，您可以将File类与FileFilter一起使用：

// directoryName could be as simple a "."
File folder = new File(directoryName);
File[] listOfFiles = folder.listFiles(new FileFilter() {
@Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".pdf");
}
});

这为您提供了特定文件夹/目录中所有文件的File对象的数组。现在，您可以使用几乎拥有的代码遍历它。

在输出端，您可能希望将输出与输入相关联。我对你的代码有点困惑，我猜你只是想要每个输入文件的输出文件。所以，也许，像这样：

// index is the value you used to loop through the `listOfFiles` array
try( FileWriter fileWriter = new FileWriter(listOfFiles[index].getName() + ".output.txt" ) ) {
fileWriter.write( // the String text you want in the file );
}

这将创建一个名为(取自您的示例("ADE_acetylfuranoside_120319_pfister.pdf.output.txt"的文件。显然，这种情况可能会改变。在这种情况下，将为每个输入文件创建一个新文件。

相关内容

最新更新

热门标签：