输出,并卡在最后一行:
17/09/07 06:01:35 INFO ClientCnxn: Socket connection established to 10.0.0.193/10.0.0.193:2181, initiating session
17/09/07 06:01:35 INFO ClientCnxn: Session establishment complete on server 10.0.0.193/10.0.0.193:2181, sessionid = 0x15e4bc9518103cc, negotiated timeout = 40000
17/09/07 06:01:35 INFO RegionSizeCalculator: **Calculating region sizes for table "event_data".**
17/09/07 06:01:35 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
17/09/07 06:01:36 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
17/09/07 06:01:36 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
17/09/07 06:01:36 INFO DAGScheduler: Parents of final stage: List()
17/09/07 06:01:36 INFO DAGScheduler: Missing parents: List()
17/09/07 06:01:36 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents
17/09/07 06:01:36 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 16.6 KB, free 414.1 MB)
17/09/07 06:01:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 8.8 KB, free 414.1 MB)
17/09/07 06:01:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.0.199:43329 (size: 8.8 KB, free: 414.4 MB)
17/09/07 06:01:36 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1007
17/09/07 06:01:36 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0))
17/09/07 06:01:36 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
17/09/07 06:01:37 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/09/07 06:01:42 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.0.0.248:55616) with ID 1
17/09/07 06:01:42 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
17/09/07 06:01:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-0-0-248.cn-north-1.compute.internal, executor 1, partition 0, RACK_LOCAL, 5053 bytes)
17/09/07 06:01:42 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-248.cn-north-1.compute.internal:34192 with 2.8 GB RAM, BlockManagerId(1, ip-10-0-0-248.cn-north-1.compute.internal, 34192, None)
17/09/07 06:01:42 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-0-248.cn-north-1.compute.internal:34192 (size: 8.8 KB, free: 2.8 GB)
17/09/07 06:01:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-10-0-0-248.cn-north-1.compute.internal:34192 (size: 28.8 KB, free: 2.8 GB)
spark sql连接到蜂巢和名称为 event_data
的表,它是存储在hbase中的外部表。
此外,如果我操纵蜂巢表(不是来自HBase),例如select count(*) from mytest01
,那将是成功的。
有时会卡在BlockManagerInfo: Removed
:
17/09/07 06:31:18 INFO ContextCleaner: Cleaned accumulator 1
17/09/07 06:31:18 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.0.0.199:43329 in memory (size: 28.8 KB, free: 414.4 MB)
17/09/07 06:31:18 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-10-0-0-248.cn-north-1.compute.internal:34192 in memory (size: 28.8 KB, free: 2.8 GB)
如何解决这个问题?谢谢。
当您确实火花提交时使用标志:
--driver-memory 4g --executor-memory 6g
粘贴此之后" spark-submit"