我想使用Spark的历史记录服务器来利用我的Web UI的日志记录机制,但是我发现在我的Windows机器上运行这段代码有些困难。
我做了以下工作:
设置我的 spark-defaults.conf 文件以反映
spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs
我 spark-env.sh 反思:
SPARK_LOG_DIR "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"
我正在使用 Git-BASH 来运行 start-history-server.sh 文件,如下所示:
USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
而且,我收到此错误:
USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
Spark Command: C:Program Files (x86)Javajdk1.8.0_91binjava -cp C:spark-1.6.2-bin-hadoop2.6/conf;C:spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-api-jdo-3.2.6.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-core-3.2.10.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
full log in C:spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
输出的完整日志可以在下面找到:
Spark Command: C:Program Files (x86)Javajdk1.8.0_91binjava -cp C:spark-1.6.2-bin-hadoop2.6/conf;C:spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-api-jdo-3.2.6.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-core-3.2.10.jar;C:spark-1.6.2-bin-hadoop2.6libdatanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
我正在运行一个 sparkR 脚本,在其中初始化我的 Spark 上下文,然后调用 init()。
请在运行 Spark 脚本之前告知我是否应该运行历史记录服务器?
继续前进的指针和提示(关于日志记录)将不胜感激。
在Windows上,你需要运行Spark的.cmd文件而不是.sh。根据我所看到的,Spark历史服务器没有.cmd脚本。所以基本上它需要手动运行。
我遵循了历史服务器Linux脚本,为了在Windows上手动运行它,您需要执行以下步骤:
- 所有历史记录服务器配置都应在 spark-defaults.conf 文件(删除
.template
后缀)中设置,如下所述 -
您应该转到 Spark 配置目录并将
spark.history.*
配置添加到%SPARK_HOME%/conf/spark-defaults.conf
。如下:spark.eventLog.enabled true spark.history.fs.logDirectory file:///c:/logs/dir/path
-
配置完成后,从 %SPARK_HOME% 运行以下命令
binspark-class.cmd org.apache.spark.deploy.history.HistoryServer
-
输出应该是这样的:
16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080. 16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080 16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called
希望它有帮助! :-)
以防有人遇到异常:
17/05/12 20:27:50 ERROR FsHistoryProvider: Exception encountered when attempting
to load application log file:/C:/Spark/Logs/spark--org.apache.spark.deploy.hist
ory.HistoryServer-1-Arsalan-PC.out
java.lang.IllegalArgumentException: Codec [out] is not available. Consider setti
ng spark.io.compression.codec=snappy
at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(Com
只需转到SparkHome/config/spark-defaults.conf并设置 spark.eventLog.compress false