我在运行 Pig 脚本时遇到以下异常。
错误 2229:找不到项目(名称:项目)的匹配 uid -1 类型:字节数组 Uid:-1 输入:0 列:12)
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:274)
at org.apache.pig.PigServer.compilePp(PigServer.java:1324)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.execute(PigServer.java:1241)
at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: -1 Input: 0 Column: 12)
at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91)
at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visitAll(AllExpressionVisitor.java:72)
at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:95)
at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:174)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48)
at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 16 more
可能是什么原因?我已经查看了脚本的扩展和替换版本,从语法的角度来看,我没有看到任何错误。
它是 0.11.1 版 (CDH 4.3) 中 Pig Optimizer 中的一个错误。这似乎与优化以下简化脚本的尝试有关
LOAD A -- Primary Driver Table
LOAD B
LOAD C
J1 = JOIN A LEFT, B
J2 = JOIN J2 LEFT, C
LOAD D
J3 = JOIN J2, D -- Inner Join
理想情况下,如果 A 之前与 D 连接,那么流经连接 J1 和 J2 的数据可以减少,从而加快速度。
我想这种优化尝试失败了。
消除此错误的一种方法是确定如何"提升"联接 J3(内部联接)在脚本中更早发生。