超时触发AWS lambda触发AWS EMR流程



我正在尝试在JavaScript中运行AWS Lambda应用程序,但我无法使其正常工作。我对JS配置和触发没有任何麻烦(我成功地运行了Hello World应用程序(,但是我遇到了AWS-SDK库的问题。老实说,我不知道这是与网络配置或IAM配置有关的问题,但是我很确定这不是脚本问题,因为我可以在计算机中没有任何本地问题运行它。我遇到的主要问题是,当Lambda App调用AWS EMR API时,会有一个超时错误。就像Lambda无法与EMR交流。

在这里,您可以看到EMR客户端(console.log(emr_client)(:

  emr: Service {
    config: 
     Config {
       credentials: 
        EnvironmentCredentials {
          expired: false,
          expireTime: null,
          accessKeyId: 'XXXXXXXXXXXXXXXX',
          sessionToken: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
          envPrefix: 'AWS' },
       credentialProvider: CredentialProviderChain { providers: [Array] },
       region: 'us-west-2',
       logger: null,
       apiVersions: {},
       apiVersion: '2009-03-31',
       endpoint: 'elasticmapreduce.us-west-2.amazonaws.com',
       httpOptions: { timeout: 120000 },
       maxRetries: undefined,
       maxRedirects: 10,
       paramValidation: true,
       sslEnabled: true,
       s3ForcePathStyle: false,
       s3BucketEndpoint: false,
       s3DisableBodySigning: true,
       computeChecksums: true,
       convertResponseTypes: true,
       correctClockSkew: false,
       customUserAgent: null,
       dynamoDbCrc32: true,
       systemClockOffset: 0,
       signatureVersion: 'v4',
       signatureCache: true,
       retryDelayOptions: {},
       useAccelerateEndpoint: false,
       accesKeyId: 'XXXXXXXXXXXXXXXX' },
    isGlobalEndpoint: false,
    endpoint: 
     Endpoint {
       protocol: 'https:',
       host: 'elasticmapreduce.us-west-2.amazonaws.com',
       port: 443,
       hostname: 'elasticmapreduce.us-west-2.amazonaws.com',
       pathname: '/',
       path: '/',
       href: 'https://elasticmapreduce.us-west-2.amazonaws.com/' },
    _clientId: 1 
    }

一些AWS配置信息:

  1. 我创建了一个VPC,其中EMR群集位于US-West-2区域,我在此处触发Lambda函数(因为我可以确认consoling process.env.AWS_REGION(。

  2. 我设置了一个以前在同一VPC中创建的子网。EMR群集在其中,Lambda功能可以访问它。

  3. 我在同一VPC中设置了一个安全组,其中允许使用所有入站/出站(从和0.0.0.0.0/0到0.0.0.0.0/0(,以查看我是否有配置问题。

  4. 我设置了一个执行角色,该角色具有以下策略,并将其与我的lambda函数相关联:

awslambdafullaccess

AmazonelastrasticmapreduceFullaccess

awslambdaexecute

awslambdavpcaccacsexecutionrole

awslambdarole

awslambdaenimanagementAccess

最后,我的代码:

const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
  const emr = new AWS.EMR({
    apiVersion:'2009-03-31',
    region: process.env.AWS_REGION,
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY
  });
  const flowSteps = {
    JobFlowId: process.env['JOB_FLOW_ID'],
    Steps: [{
      Name: "my_beautiful_step",
      ActionOnFailure: "CANCEL_AND_WAIT",
      HadoopJarStep: {
        Jar: "command-runner.jar",
        Args: [
          "spark-submit",
          "--master"," yarn",
          ...
          ...
          ...
        ]
      }
    }]
  };
  emr.addJobFlowSteps(flowSteps, (err, data) => {
    if (err) {
      console.log('ERROR', err, err.stack);
    } else {
      console.log('NO ERROR', data);
    }
  });
};

编辑:我尝试与S3通信(获取存储桶位置(只是为了测试问题是否仅与EMR有关,但功能也很短。

好吧,我解决了问题。基本上,如果您没有Internet访问权限,则无法在VPC中调用AWS API端点,因为大多数AWS服务都有公共URL,例如https://elasticmapreduce.us-west-2.amazonaws.com。当您安装EMR客户端对象时,您可以清楚地看到这一点(这也适用于我验证的其他客户端对象(例如S3(

Service {
  config: 
   Config {
     ...
     ...
     region: 'us-west-2',
     logger: null,
     apiVersions: {},
     apiVersion: null,
     endpoint: 'elasticmapreduce.us-west-2.amazonaws.com',
     httpOptions: { timeout: 120000 },
     maxRetries: undefined,
   },
  endpoint: 
   Endpoint {
     protocol: 'https:',
     host: 'elasticmapreduce.us-west-2.amazonaws.com',
     port: 443,
     hostname: 'elasticmapreduce.us-west-2.amazonaws.com',
     pathname: '/',
     path: '/',
     href: 'https://elasticmapreduce.us-west-2.amazonaws.com/' 
    },
  ...
}

无论如何,AWS在VPCS VPC端点内提供了一些本地端点,因此您可以访问VPC内部的这些服务端点,而无需访问Internet。在另一种情况下,您必须设置NAT网关 Internet网关(〜u $ s 30/月(才能访问其他服务,例如Emr。

最新更新