Azure Synapse Analytics管道中的Errorcode:6002



在流水线中运行笔记本,转换数据并保存后,出现如下错误。当数据写入csv时,如果注释掉,则管道工作。在正常的笔记本运行中,数据写入csv也工作得很好,但只有在管道中它才会中断。平台- Azure Synapse Analytics/Workspace/pipelinepython in pyspark

{
"errorCode": "6002",
"message": "Py4JJavaError: An error occurred while calling o666.csv.n: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1185)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:504)ntat org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1696)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1013)ntat org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:119)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)ntat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:218)ntat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:256)ntat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)ntat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:253)ntat org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:214)ntat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:148)ntat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:147)ntat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:995)ntat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)ntat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:181)ntat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:94)ntat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)ntat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)ntat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:995)ntat org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:444)ntat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:416)ntat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:294)ntat org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:985)ntat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)ntat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)ntat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)ntat java.lang.reflect.Method.invoke(Method.java:498)ntat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)ntat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)ntat py4j.Gateway.invoke(Gateway.java:282)ntat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)ntat py4j.commands.CallCommand.execute(CallCommand.java:79)ntat py4j.GatewayConnection.run(GatewayConnection.java:238)ntat java.lang.Thread.run(Thread.java:748)nCaused by: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90ntat org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:207)ntat org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:570)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:802)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:502)nt... 35 morennTraceback (most recent call last):nn  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1372, in csvn    self._jwrite.csv(path)nn  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__n    return_value = get_return_value(nn  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in decon    return f(*a, **kw)nn  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_valuen    raise Py4JJavaError(nnpy4j.protocol.Py4JJavaError: An error occurred while calling o666.csv.n: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1185)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:504)ntat org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1696)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.exists(AzureBlobFileSystem.java:1013)ntat org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:119)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)ntat org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)ntat org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:218)ntat org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:256)ntat org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)ntat org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:253)ntat org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:214)ntat org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:148)ntat org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:147)ntat org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:995)ntat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:107)ntat org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:181)ntat org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:94)ntat org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)ntat org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)ntat org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:995)ntat org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:444)ntat org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:416)ntat org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:294)ntat org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:985)ntat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)ntat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)ntat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)ntat java.lang.reflect.Method.invoke(Method.java:498)ntat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)ntat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)ntat py4j.Gateway.invoke(Gateway.java:282)ntat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)ntat py4j.commands.CallCommand.execute(CallCommand.java:79)ntat py4j.GatewayConnection.run(GatewayConnection.java:238)ntat java.lang.Thread.run(Thread.java:748)nCaused by: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv?upn=false&action=getStatus&timeout=90ntat org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:207)ntat org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:570)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:802)ntat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:502)nt... 35 morenn",
"failureType": "UserError",
"target": "Product_Data_pipeline3",
"details": []

发生的错误表明这是由于试图在未经授权的情况下执行操作造成的。原因是您正在尝试访问存储帐户https://bcnpricing.dfs.core.windows.net/test/test/data/output/test_df3.csv

当尝试访问存储帐户时出现AccessDeniedException时,是因为synapse工作区缺乏访问存储帐户的权限。因此,要访问特定的存储帐户,您需要在突触MSI中授予Storage blob Contributor角色。

这是一个解决类似问题的链接。

要将Storage blob Contributor角色添加到您的突触工作区,您可以参考这个Microsoft文档。

注意:请联系Resource的所有者更改角色。如果您是所有者,那么您已经拥有Storage Blob Contributor角色。

相关内容

最新更新