我的oozie工作流片段如下:
<workflow-app name="Abandonment_Workflow" xmlns="uri:oozie:workflow:0.5">
<start to="pig-0581"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="pig-0581">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>/user/793972/TRM/1.pig</script>
<param>input=/data/*/*.bz2</param>
<archive>/user/a.jar#a.jar</archive>
</pig>
<ok to="fork-3d77"/>
<error to="Kill"/>
</action>
<action name="pig-a915">
<pig>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<script>/user/793972/TRM/2.pig</script>
<param>input=/data/*/*.bz2</param>
<archive>/user/a.jar#a.jar</archive>
</pig>
<ok to="join-31be"/>
<error to="Kill"/>
</action>
.......
<end name="End"/>
</workflow-app>
In pig script .pig
data = LOAD $input USING PigStorage('t') AS
(timestamp:chararray,server:chararray,sessionid:chararray);
在猪脚本2。猪,我想在1中使用变量data。猪,
cleandata = foreach data generate .....
这是可能的吗?
如果是,请建议如何
我认为你不能在猪身上做到这一点。当pig脚本被执行时,编译器会将pig latin命令转换成一个或多个独立运行的MR作业。因此,两个pig脚本不能相互通信。
然而,你可以尝试使用Pig宏。
--LoadInput.macro
DEFINE loadInput(input) returns data{
$data = LOAD '$input' USING PigStorage('t') AS
(timestamp:chararray,server:chararray,sessionid:chararray);
}
小猪脚本1
IMPORT '/path/LoadInput.macro';
data = loadInput($input);
cleandata = FOREACH data GENERATE timestamp, sessionid;
猪脚本2
IMPORT '/path/LoadInput.macro';
data2 = loadInput($input);
cleandata2 = FOREACH data2 GENERATE timestamp, server;