AVG, MIN, MAX错误在猪计数工作



我试图使用AVG,MIN,MAX在猪。MIN和MAX函数在执行时都卡住了,AVG函数抛出错误。但是COUNT函数工作得很好。

org.apache.pig.backend.executionengine。ExecException: ERROR 0:标量在输出中有不止一行。第一名:(二年级老师,{(65587.90)}),第二名:(四年级老师,{(56567.24)})

我代码:

register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader()  AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary:float, REPLACE(TA,',','') as TA:float, type, org, year;
C = filter B by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E genetate group,AVG(D.salary);
minim = foreach E genetate group,MIN(D.salary);
maxim = foreach E genetate group,MAX(D.salary);

样本数据

 (ABBOTT,DEEDEE W,GRADES 9-12 TEACHER,52,122.10,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABBOTT,RYAN V,GRADE 4 TEACHER,56,567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABBOUD,CLAUDIA MORA,GRADES K-5 TEACHER,63,957.50,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABDUL-JABBAR,KHADEEJA ,GRADES 9-12 TEACHER,16,791.73,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABDUL-RAZACQ,SALAHUD-DIN ,INSTRUCTIONAL SPECIALIST P-8,45,832.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABDULLAH,DIANA ,SPECIAL ED PARAPRO/AIDE,10,934.94,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABDULLAH,NADIYAH W,GRADES 6-8 TEACHER,75,109.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (ABDULLAH,RHONDALYN Y,SPECIAL ED PARAPRO/AIDE,28,649.34,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
 (OSBORNE,CHRISTINE L,INSTRUCTIONAL SUPERVISOR,78,875.59,3,265.71,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
 (OSBORNE,DORIS A,OCCUPATIONAL THERAPIST ,65,421.79,1,156.05,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)

第7行GROUP操作后的样例数据

 (GRADE 2 TEACHER,{(OSBORNE,VIRGINIA E,GRADE 2 TEACHER,65587.90,0,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)})
 (GRADE 4 TEACHER,{(ABBOTT,RYAN V,GRADE 4 TEACHER,56567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)})
 (MAINTENANCE PERSONNEL,{(BROOKS,RICHARD M,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SUMNER,ROBERT O,MAINTENANCE PERSONNEL,72655.53,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MCCULLOUGH,ALVIN J,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(DALTON,JAMES E,MAINTENANCE PERSONNEL,72655.52,2124.60,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SMITH,KEVIN W,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MANGHAM,LARRY G,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010)})

猪身上有虫子吗?

这是更新后的Pig脚本。

register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader()  AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary, REPLACE(TA,',','') as TA, type, org, year;
B1 = foreach B generate name, job, (double)salary, (double)TA, type, org, year;
C = filter B1 by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E generate group,AVG(D.salary);
minim = foreach E generate group,MIN(D.salary);
maxim = foreach E generate group,MAX(D.salary);

问题是,您需要为salaryTA属性提供显式强制转换。

最新更新