Hive Python UDF

我正在使用此python udf脚本：

import sys
import collections 
import datetime
import re
try:
    for line in sys.stdin: 
        line=line.strip()
        number,sd=line.split('t')
        sd=sd.lower()
        sd=sd.split(' ')
        new_sd_list=collections.OrderedDict(collections.Counter(sd))
        new_sd=' '.join(new_sd_list)
        print('t'.join([str(number),str(new_sd])))
except:
    print(sys.exc_info())

在PUTTY中执行以下命令。

SELECT TRANSFORM(number,shortdescription) USING 'python name.py' 
   AS (number,shortdescription) FROM table;

我遇到了这个错误：

由：org.apache.hadoop.hive.ql.metadata.hiveException：处理行时蜂巢运行时错误{在印度优化器中。'}
失败：执行错误，返回代码2从org.apache.hadoop.hive.ql.exec.mr.mapredtask启动了MapReduce工作：阶段阶段1：地图：4 HDFS读取：0 HDFS写作：0失败总MAPREDUCE CPU花费的时间：0 msec

import sys
import collections 
import datetime
import re
try:
    for line in sys.stdin: 
        line=line.strip()
        number,sd=line.split('t')
        sd=sd.lower()
        sd=sd.split(' ')
        new_sd_list=collections.OrderedDict(collections.Counter(sd))
        new_sd=' '.join(new_sd_list)
        print('t'.join([str(number),str(new_sd)])) #syntax error
except:
    print(sys.exc_info())

相关内容

最新更新

热门标签：