将 EMR 配置为使用 s3a 而不是 s3 进行 Spark.sql 呼叫



我对 spark.sql(") 的所有调用都失败,并在下面的堆栈跟踪 (1) 中出现错误

更新 - 2我已经将问题归零,它是 访问被拒绝 对于 sts:假设规则,任何线索都值得赞赏

User: arn:aws:sts::00000000000:assumed-role/EMR_EC2_XXXXX_XXXXXX_POLICY/i-3232131232131232 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::00000000000:role/EMR_XXXXXX_XXXXXX_POLICY

当使用 访问同一位置时

spark.read.parquet("s3a://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

我能够阅读记录。

但是当使用 s3: 而不是 s3a: 方案访问时,相同的堆栈跟踪 (1) 会重新出现

spark.read.parquet("s3://xxx.xxx-xxx-xx.xxxxx-xxxxx/xxx/")

那么我如何在 EMR 上配置 Spark 以使用 s3a: 或让 s3: 在没有拒绝访问的情况下运行,这是假定的,因为它可能没有使用适当的凭证链

(1)

Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Access denied (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxx-xxxx-xxxx-xxxx-xxxxxxxx)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1658)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1322)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1072)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:745)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:719)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:701)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:669)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:651)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:515)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1369)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1338)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1327)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:488)
  at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:460)

更新 - 1尝试设置机密和访问密钥不起作用

spark.sparkContext.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "")
spark.sparkContext.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "")

此堆栈跟踪显示"Amazon EMR S3 客户端";而不是 Apache ASF 客户端,因此设置和错误消息不同。

有关"代入角色"

的错误消息暗示您正在 EC2 虚拟机中运行(是吗?),并且"代入角色"实际上是部署 EC2 虚拟机的 IAM 角色。在这种情况下,(a) 不会选取其他凭据,并且 (b) 该 VM 无权访问该角色。修复:确定用于获取凭证、增加 EC2 IAM 角色权限或创建具有不同角色的虚拟机的设置

相关内容

  • 没有找到相关文章