错误:只支持本地python文件:已解析参数



我尝试通过提交python文件在批处理模式下执行livy,但它不起作用,我尝试了两种方法-

  1. 从本地文件系统运行py文件&还
  2. 在hdfs上复制它…但这行不通……

请帮

hduser@tarun-ubuntu:/home/tarun/spark/examples/src/main/python$ curl -X POST -H "Content-Type: application/json"  tarun-ubuntu:8998/batches --data '{"file": "file:///home/tarun/spark/examples/src/main/python/pi.py", "name": "pipy", "executorCores":1, "executorMemory":"512m", "driverCores":1, "driverMemory":"512m", "queue":"default", "args":["10"]}'
"requirement failed: Local path /home/tarun/spark/examples/src/main/python/pi.py cannot be added to user sessions."

所以我把pi.py移到hdfs at live至少接受curl调用:

hduser@tarun-ubuntu:/home/tarun/spark/examples/src/main/python$ curl -X POST -H "Content-Type: application/json"  tarun-ubuntu:8998/batches --data '{"file": "/pi.py", "name": "pipy", "executorCores":1, "executorMemory":"512m", "driverCores":1, "driverMemory":"512m", "queue":"default", "args":["10"]}'
{"id":20,"state":"running","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}

但是当我检查日志时:

$ curl tarun-ubuntu:8998/batches/20/log | python -m json.tool  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1415  100  1415    0     0   186k      0 --:--:-- --:--:-- --:--:--  197k
{
    "from": 0,
    "id": 20,
    "log": [
        "Error: Only local python files are supported: Parsed arguments:",
        "  master                  local",
        "  deployMode              client",
        "  executorMemory          512m",
        "  executorCores           1",
        "  totalExecutorCores      null",
        "  propertiesFile          /home/tarun/spark/conf/spark-defaults.conf",
        "  driverMemory            512m",
        "  driverCores             1",
        "  driverExtraClassPath    null",
        "  driverExtraLibraryPath  null",
        "  driverExtraJavaOptions  null",
        "  supervise               false",
        "  queue                   default",
        "  numExecutors            null",
        "  files                   null",
        "  pyFiles                 null",
        "  archives                null",
        "  mainClass               null",
        "  primaryResource         hdfs://localhost:54310/pi.py",
        "  name                    pipy",
        "  childArgs               [10]",
        "  jars                    null",
        "  packages                null",
        "  packagesExclusions      null",
        "  repositories            null",
        "  verbose                 false",
        "",
        "Spark properties used, including those specified through",
        " --conf and those from the properties file /home/tarun/spark/conf/spark-defaults.conf:",
        "  spark.driver.memory -> 512m",
        "  spark.executor.memory -> 512m",
        "  spark.driver.cores -> 1",
        "  spark.master -> local",
        "  spark.executor.cores -> 1",
        "",
        "    .primaryResource",
        "Run with --help for usage help or --verbose for debug output"
    ],
    "total": 38
}
curl tarun-ubuntu:8998/batches/20 | python -m json.tool  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   482  100   482    0     0   105k      0 --:--:-- --:--:-- --:--:--  117k
{
    "appId": null,
    "appInfo": {
        "driverLogUrl": null,
        "sparkUiUrl": null
    },
    "id": 20,
    "log": [
        "Spark properties used, including those specified through",
        " --conf and those from the properties file /home/tarun/spark/conf/spark-defaults.conf:",
        "  spark.driver.memory -> 512m",
        "  spark.executor.memory -> 512m",
        "  spark.driver.cores -> 1",
        "  spark.master -> local",
        "  spark.executor.cores -> 1",
        "",
        "    .primaryResource",
        "Run with --help for usage help or --verbose for debug output"
    ],
    "state": "dead"
}

错误Only local python files are supported很可能是由Spark抛出的,因为Livy默认将HDFS前缀附加到您的文件路径。

你应该尝试两件事:

  1. 将要读取py文件的目录添加到livy.conf中的livy.file.local-dir-whitelist设置中。根据配置文件中的注释,应用程序"在启动会话时只能引用远程uri"。这很可能是Livy在提交作业时默认使用HDFS的原因。

  2. 当您将file参数传递给REST API时,在file:/之后使用一个斜杠。例如:{"file": "file:/home/tarun/spark/examples/src/main/python/pi.py"}。我相信这是正确的语法

在集群模式下运行时需要注意的一件事:

注意,这个URL应该可以被Spark驱动进程访问。如果在集群模式下运行驱动程序,它可能驻留在不同的主机上,这意味着"file:"url必须存在于该节点上(而不是在客户端机器上)。

换句话说,您可能需要在集群中的每个节点上都有一个py文件的副本,以确保驱动程序可以读取该文件。

希望对你有帮助。

相关内容

  • 没有找到相关文章

最新更新