如何启用Spark-History Server以备独立的群集非HDFS模式



我已经设置了spark2.1.1 cluster(1个主2奴隶(之后 - 在独立模式下进行多个机器/。我在机器上没有预先设置的设置。我想启动Spark Histrestory服务器。我按照以下方式运行它:

roshan@bolt:~/spark/spark_home/sbin$ ./start-history-server.sh

和在spark-defaults.conf中我设置了以下内容:

spark.eventLog.enabled           true

但由于错误而失败:

7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(roshan); groups with view permissions: Set(); users  with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
    at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)

我应该设置为spark.history.fs.logDirectoryspark.eventLog.dir

更新1:

spark.eventLog.enabled           true
spark.history.fs.logDirectory   file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir               file:////home/roshan/spark/spark_home/logs

但是我总是遇到这个错误:

java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co

默认情况下,火花将 file:/tmp/spark-events定义为历史服务器的日志目录,而您的日志清楚地说 spark.fs.logdirectory 未配置

首先,您需要创建 spark-events 文件夹/tmp (这不是一个好主意,因为/tmp 是刷新的每次重新启动机器时(,然后添加 spark.fs.logdirectory in spark-defaults.conf 指向该目录。但是我建议您创建另一个文件夹,该文件夹可以访问并更新 spark-defaults.conf file。

您需要在 spark-defaults.conf 文件

中定义两个变量
spark.eventLog.dir              file:path to where you want to store your logs
spark.history.fs.logDirectory   file:same path as above

假设您要存储在/opt/opt/spark-events 其中 spark user 可以访问 spark-defaults.conf

spark.eventLog.enabled          true
spark.eventLog.dir              file:/opt/spark-events
spark.history.fs.logDirectory   file:/opt/spark-events

您可以在监视和仪器中找到更多信息

尝试设置

spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec 

in spark-defaults.conf

最新更新