r语言 - SparkR 1.5.2 连接到 HIVE = 停止工作并生成错误



当我运行R并通过SparkR 1.5连接到Spark时,我在单节点hadoop POC环境(Ubuntu 14.04(上遇到了问题。我之前运行过几次这个测试,直到今天我都没有这个问题。

我的目标是使用 SparkR 连接到 Hive 并引入一个表(最终将 df 结果写回 Hive(。这是来自 RStudio 的 R 控制台的工作。我完全被难住了,任何帮助的建议都是值得赞赏的。

library(SparkR, lib.loc="/usr/hdp/2.3.6.0-3796/spark/R/lib/")
sc <- sparkR.init(sparkHome = "/usr/hdp/2.3.6.0-3796/spark/")

Launching java with spark-submit command /usr/hdp/2.3.6.0-3796/spark//bin/spark-submit   sparkr-shell /tmp/RtmpdGojW1/backend_portb8b949c8f0e2 
17/08/15 15:50:18 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:19 INFO SparkContext: Running Spark version 1.5.2
17/08/15 15:50:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/15 15:50:20 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:20 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 10.100.0.11 instead (on interface eth0)
17/08/15 15:50:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/08/15 15:50:20 INFO SecurityManager: Changing view acls to: rstudio
17/08/15 15:50:20 INFO SecurityManager: Changing modify acls to: rstudio
17/08/15 15:50:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rstudio); users with modify permissions: Set(rstudio)
17/08/15 15:50:22 INFO Slf4jLogger: Slf4jLogger started
17/08/15 15:50:22 INFO Remoting: Starting remoting
17/08/15 15:50:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.100.0.11:43827]
17/08/15 15:50:23 INFO Utils: Successfully started service 'sparkDriver' on port 43827.
17/08/15 15:50:23 INFO SparkEnv: Registering MapOutputTracker
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 INFO SparkEnv: Registering BlockManagerMaster
17/08/15 15:50:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bea658dc-145f-48a6-bb28-6f05af529547
17/08/15 15:50:23 INFO MemoryStore: MemoryStore started with capacity 530.0 MB
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-6b719b9d-3d54-48bc-8894-cd2ddf9b0755/httpd-e7371ee1-5574-476d-9d53-679a9781af2d
17/08/15 15:50:23 INFO HttpServer: Starting HTTP Server
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT
17/08/15 15:50:23 INFO AbstractConnector: Started SocketConnector@0.0.0.0:39275
17/08/15 15:50:23 INFO Utils: Successfully started service 'HTTP file server' on port 39275.
17/08/15 15:50:23 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT
17/08/15 15:50:24 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/08/15 15:50:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/15 15:50:24 INFO SparkUI: Started SparkUI at http://10.100.0.11:4040
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:50:24 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
17/08/15 15:50:24 INFO Executor: Starting executor ID driver on host localhost
17/08/15 15:50:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43075.
17/08/15 15:50:24 INFO NettyBlockTransferService: Server created on 43075
17/08/15 15:50:24 INFO BlockManagerMaster: Trying to register BlockManager
17/08/15 15:50:24 INFO BlockManagerMasterEndpoint: Registering block manager localhost:43075 with 530.0 MB RAM, BlockManagerId(driver, localhost, 43075)
17/08/15 15:50:24 INFO BlockManagerMaster: Registered BlockManager

hiveContext <- sparkRHive.init(sc)

17/08/15 15:51:17 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:19 INFO HiveContext: Initializing execution hive, version 1.2.1
17/08/15 15:51:19 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796
17/08/15 15:51:19 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796
17/08/15 15:51:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:20 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/08/15 15:51:20 INFO metastore: Connected to metastore.
17/08/15 15:51:21 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/a4f76c27-cf73-45bf-b873-a0e97ca43309_resources
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309/_tmp_space.db
17/08/15 15:51:22 INFO HiveContext: default warehouse location is /user/hive/warehouse
17/08/15 15:51:22 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/08/15 15:51:22 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796
17/08/15 15:51:22 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796
17/08/15 15:51:22 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.
17/08/15 15:51:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/15 15:51:25 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/08/15 15:51:25 INFO metastore: Connected to metastore.
17/08/15 15:51:27 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/16b5f51f-f570-4fc0-b3a6-eda3edd19b59_resources
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59/_tmp_space.db

showDF(sql(hiveContext, "USE MyHiveDB"))

Error: is.character(x) is not TRUE

showDF(sql(hiveContext, "SELECT *  FROM table"))

Error: is.character(x) is not TRUE

已解决。这里的问题正是cricket_007对数据砖链接的建议。R 会话中使用的一些包与 SparkR 实例冲突。

通过将它们与当前 R 会话分离,这解决了问题并使代码正常工作。

要分离的包是:

  • 普利尔
  • 德普利尔
  • 德普利尔

最新更新