使用Terraform在EMR上启用Presto/Spark的胶水目录的选项

想知道是否有支持在EMR上运行时启用AWS GLUE目录。

从上面的答案提供的链接中，我能够为Terraform代码建模如下 - ：

创建一个configuration.json.tpl，带有以下内容

[{
       "Classification": "spark-hive-site",
       "Properties": {
         "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
       }
     }
]

从Terraform代码中的上述模板创建模板

data "template_file" "cluster_1_configuration" {
  template = "${file("${path.module}/templates/configuration.json.tpl")}"
}

然后设置群集以这样的设置 -

resource "aws_emr_cluster" "cluster_1" {
  name          = "${var.cluster_name}-1"
  release_label = "emr-5.21.0"
  applications  = ["Spark", "Zeppelin", "Hadoop","Sqoop"]
  log_uri       = "s3n://${var.cluster_name}/logs/"
  configurations = "${data.template_file.cluster_1_configuration.rendered}"
  ...
}

胶水现在应该从spark中使用，您可以通过调用spark.catalog.listdatabases（）。show（）。

以下AWS文档讨论了与AWS胶水数据目录在Amazon EMR上使用Apache Spark和Hive的讨论，还使用AWS胶水数据目录作为Presto的默认Hive Metastore（Amazon Emr Release版本5.10.0及以后）。希望您正在寻找这个？

https://docs.aws.amazon.com/emr/latest/releaseguide/emr-presto-glue.html和

和

https://aws.amazon.com/about-aws/whats-new/2017/08/use-apache-spark-spark-and-hive-hive-nive-on-amazon-amazon-emr-with-with-with-with-the-aws-glue-data-目录/

另外，请检查此链接以获取EMR上的一些胶水目录配置：

AWS胶水数据目录作为emr

的Spark SQL的Metastore问题

相关内容

最新更新

热门标签：