tabula-py 在 Java 调用 Python 脚本时无法读取文件

我正在开发一个基于java的项目。java程序将运行命令来调用python脚本。

python脚本使用tabula py读取pdf文件并返回数据。

当我在终端(pytho3 xxx.py(中直接调用python时，我尝试了python脚本

然而，当我试图从java调用python脚本时，它会抛出错误：

Error from tabula-java:Error: File does not exist
Command '['java', '-Dfile.encoding=UTF8', '-jar', '/home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', 'all', '--lattice', '--guess', '--format', 'JSON', '/home/ubuntu/Documents/xxxx.pdf']' returned non-zero exit status 1.

我尝试以完整路径调用脚本，以完整路径提供pdf文件，尝试了sys.append(python script path)，但两者都不起作用。

我试着在java命令中调用tabula，即java-Dfile.concoding=UTF8-jar/home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependences.jar"；文件路径"；

这是工作，可以读取文件。然而，回到java调用python脚本是不起作用的

有什么方法可以解决这个问题吗？在java程序中使用tabula对于我的来说不是一个选项

既然你提到你使用java编写基本代码，使用python阅读PDF，那么完全使用java编写更高效的代码会更好。为什么？因为已经有工具为您准备好了。绝对没有必要费力地将一种语言与另一种语言联系起来。

代码：


import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
/**
* This class is used to read an existing
*  pdf file using iText jar.
*/
public class PDFReadExample {
public static void main(String args[]){
try {
//Create PdfReader instance.
PdfReader pdfReader = new PdfReader("D:\testFile.pdf");    

//Get the number of pages in pdf.
int pages = pdfReader.getNumberOfPages(); 

//Iterate the pdf through pages.
for(int i=1; i<=pages; i++) { 
//Extract the page content using PdfTextExtractor.
String pageContent = 
PdfTextExtractor.getTextFromPage(pdfReader, i);

//Print the page content on console.
System.out.println("Content on Page "
+ i + ": " + pageContent);
}

//Close the PdfReader.
pdfReader.close();
} catch (Exception e) {
e.printStackTrace();
}
}

相关内容

最新更新

热门标签：