当我在pyspark EMR 5.x中运行用Java编写的hive UDF时出错



我有一个用java编写的Hive UDF,我正试图在pyspark 2.0.0中使用它。以下是步骤1.将jar文件复制到EMR2.开始了一项类似以下的pyspark工作

pyspark --jars ip-udf-0.0.1-SNAPSHOT-jar-with-dependencies-latest.jar

3.使用以下代码访问UDF

from pyspark.sql import SparkSession
from pyspark.sql import HiveContext
sc = spark.sparkContext
sqlContext = HiveContext(sc)
sqlContext.sql("create temporary function ip_map as 'com.mediaiq.hive.IPMappingUDF'")

我得到以下错误:

py4j.protocol.Py4JJava错误:调用o43.sql时出错。:java.lang.NoSuchMethodError:org.apache.hadop.hive.conf.HiveConf.getTimeVar(Lorg/apache/hadop/hive/conf/HiveConf$ConfVars;Ljava/util/concurrent/TimeUnit;)J在org.apache.hadop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:76)在org.apache.hadop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)在org.apache.hadop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)在org.apache.hadop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)在org.apache.hadop.hive.ql.metadata.hive.createMetaStoreClient(hive.java:2453)网址:org.apache.hadop.hive.ql.metadata.hive.getMSC(hive.java:2465)org.apache.hadop.hive.ql.session.SessionState.start(SessionState.java:340)在org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:189)在sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native方法)sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor Impl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessor Impl.java:45)位于java.lang.reflect.Constructure.newInstance(Constructor.java:423)在org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoade.scala:258)在org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)在org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)在org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)在org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)在org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)在org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)在org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)在org.apache.spark.sql.hive.HiveSessionState.contation(HiveSessionState.scala:48)在org.apache.spark.sql.hive.HiveSessionState$$anon$1。(HiveSessionState.scala:63)在org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)在org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)在org.apache.spark.sql.exexecution.QueryExecution.assertAnalytical(QueryExecution.scala:49)网址:org.apache.spark.sql.Dataset$.Rows(数据集.scala:64)org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:498)位于的py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)位于的py4j.reflection.ReflectionEngine.reinvoke(ReflectionEngine.java:357)py4j.Gateway.ioke(Gateway.java:280)位于py4j.commands.AbstractCommand.invokeMethod(AbstractCmd.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)py4j.GatewayConnection.run(GatewayConnection.java:214)位于java.lang.Thread.run(线程.java:745)

您可能使用不同版本的Hive构建了UDF。请确保在pom.xml中指定用于构建包含UDF的jar的相同版本的Hive。例如,请参阅前面的答案。

相关内容

  • 没有找到相关文章

最新更新