winutils spark windows installation env_variable



我正在尝试在windows 10上安装Spark 1.6.1,到目前为止我已经完成了以下操作。。。

  1. 下载了spark 1.6.1,解压到某个目录,然后设置spark_HOME
  2. 下载了scala 2.11.8,解压到某个目录,然后设置scala_HOME
  3. 设置_JAVA_OPTION环境变量
  4. 已从下载winutilshttps://github.com/steveloughran/winutils.git只需下载zip目录,然后设置HADOOP_HOME env变量。(不确定这是否不正确,由于权限被拒绝,我无法克隆目录)

当我去火花之家并运行垃圾箱\火花壳时,我会得到

'C:Program' is not recognized as an internal or external command, operable program or batch file.

我一定错过了什么,我不知道我怎么能在windows环境中运行bash脚本。但希望我不需要理解,只是为了让它发挥作用。我一直在学习这个家伙的教程-https://hernandezpaul.wordpress.com/2016/01/24/apache-spark-installation-on-windows-10/。如有任何帮助,我们将不胜感激。

您需要下载winutils可执行文件,而不是源代码。

你可以在这里下载它,或者如果你真的想要整个Hadoop发行版,你可以在那里找到2.6.0二进制文件。然后,您需要将HADOOP_HOME设置为包含winutils.exe的目录。

此外,请确保Spark所在的目录不包含空格,这一点非常重要,否则它将无法工作。

一旦设置好,就不会启动spark-shell.sh,而是启动spark-shell.cmd:

C:Sparkbin>spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 1.6.1
      /_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:01 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/05/18 19:32:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
16/05/18 19:32:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/18 19:32:12 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/05/18 19:32:12 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.
scala>

在windows上,您需要明确指定hadoop二进制文件的位置。

以下是设置spark-scala独立应用程序的步骤。

  1. 下载winutil.exe并将其放在bin文件夹下的某个文件夹/目录中,例如c:\hadoop\bin

完整路径看起来像c:\hadoop\bin\winutil.exe

  1. 现在,在创建sparkSession时,我们需要指定此路径。请参阅下面的代码片段:

    包com.test.config

     import org.apache.spark.sql.SparkSession
     object Spark2Config extends Serializable{
                System.setProperty("hadoop.home.dir", "C:\hadoop")
                val spark = SparkSession.builder().appName("app_name").master("local").getOrCreate()       
     }
    

最新更新