是否可以运行tesseract.js(从https://github.com/naptha/tesseract.js)



我正在尝试从Java对图像文件进行OCR。所以我决定从https://github.com/naptha/tesseract.js使用Tesseract.js,并通过GraalVM中的graal.js功能调用它,但无法让它工作。

这是我试过的。

public static final String TESSERACT = "src/tesseract.js";
private static void tesseract(String imageFile) throws IOException
{
System.out.println("=== Calling Tesseract === ");
try(Context context = Context.create())
{
context.eval(Source.newBuilder("js", new File(TESSERACT)).build());
Value Tesseract = context.getBindings("js").getMember("Tesseract");
Value recognize = Tesseract.getMember("recognize");
long start = System.currentTimeMillis();
String result = recognize.execute(imageFile).asString();
long took = System.currentTimeMillis() - start;
System.out.println("Tesseract call took: " + took + "ms with result: " + result);
} // context.close() is automatic
}

编译后在运行时抛出这个异常:

=== Calling Tesseract === 
Exception in thread "main" ReferenceError: window is not defined
at <js> spawnWorker(srctesseract.js:286:8848-8853)
at <js> _delay(srctesseract.js:504:16140-16184)
at <js> recognize(srctesseract.js:472-481:15321-15620)
at org.graalvm.polyglot.Value.execute(Value.java:457)
at com.mycompany.app.JsApp.tesseract(JsApp.java:98)
at com.mycompany.app.JsApp.main(JsApp.java:70)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at com.intellij.rt.execution.application.AppMainV2.main(AppMainV2.java:131)

Anyone know to to fix this?

主要问题是tesseract.js期望在浏览器中运行。没有定义窗口,因为不是在浏览器中运行tesseract.js,而是在不同的JavaScript运行时中运行它。

为了解决您的问题,我会使用Tess4j运行Tesseract OCR。Tess4j是一个围绕Tesseract的JNA包装器(就像tesseract.js是一个浏览器包装器一样)。

最新更新