使用terraform在bigquery表上创建分区



描述

我有使用terraform创建的bigquery表的列表,但我只需要特定表的分区。

这是前

locals {
path = "../../../../../../../../db"
gcp_bq_tables = [
"my_table1",
"my_table1_daily",
"my_table2",
"my_table2_daily"
]
}

以及,创建表格的地形脚本:

resource "google_bigquery_table" "gcp_bq_tables" {
for_each    = toset(local.gcp_bq_tables)
dataset_id  = google_bigquery_dataset.gcp_bq_db.dataset_id
table_id    = each.value
schema      = file("${local.path}/schema/${each.value}.json")
labels = {
env = var.env
app = var.app
}
}

我需要在时间戳上创建分区,键入DAY,但列不同。比方说对于my_table1,

  1. 对于table1,分区列将是my_ts_column_table1
  2. 表2的分区列将是my_last_modified_column_table2

如何在此场景中编写地形脚本。

我的探索

我在terraform_document中找到了一种方法,但不确定多个表以及如何为两个表指定分区列。

在这种情况下,最好使用dynamic[1]和for_each元参数[2]来实现您想要的。代码必须更改为:

resource "google_bigquery_table" "gcp_bq_tables" {
for_each    = toset(local.gcp_bq_tables)
dataset_id  = google_bigquery_dataset.gcp_bq_db.dataset_id
table_id    = each.value
schema      = file("${local.path}/schema/${each.value}.json")
dynamic "time_partitioning" {
for_each = each.value == "table1" || each.value == "table2" ? [1] : []
content {
type  = "DAY"
field = each.value == "table1" ? "my_ts_column_table1" : "my_last_modified_column_table2"
}
}
labels = {
env = var.env
app = var.app
}
}

[1]https://developer.hashicorp.com/terraform/language/expressions/dynamic-blocks

[2]https://developer.hashicorp.com/terraform/language/meta-arguments/for_each

我希望这个解决方案能有所帮助。

您可以配置一个json文件来动态创建带有分区的表。

tables.json文件

{
"tables": {
"my_table1": {
"dataset_id": "my_dataset",
"table_id": "my_table",
"schema_path": "folder/myschema.json",
"partition_type": "DAY",
"partition_field": "partitionField",
"clustering": [
"field",
"field2"
]
},
"my_table2": {
"dataset_id": "my_dataset",
"table_id": "my_table2",
"schema_path": "folder/myschema2.json",
"partition_type": "DAY",
"partition_field": "partitionField2",
"clustering": [
"field",
"field2"
]
}
}

然后从Terraform本地文件中检索您的表。

locals.tf文件:

locals {
tables = jsondecode(file("${path.module}/resource/tables.json"))["tables"]
}

我在myDefaultDate字段的variables.json文件中放置了一个默认分区:

variable "time_partitioning" {
description = "Configures time-based partitioning for this table. cf https://www.terraform.io/docs/providers/google/r/bigquery_table.html#field"
type = map(string)
default = {
type = "DAY"
field = "myDefaultDate"
}
}

resource.tf文件中,我使用了dynamic块:

  • 如果分区存在于Json元数据配置文件tables.json的当前表中,则我接受它。

  • 否则,我将使用variables.tf文件提供的默认分区。

resource.tf文件:

resource "google_bigquery_table" "tables" {
for_each = local.tables
project = var.project_id
dataset_id = each.value["dataset_id"]
table_id = each.value["table_id"]
clustering = try(each.value["clustering"], [])
dynamic "time_partitioning" {
for_each = [
var.time_partitioning
]
content {
type = try(each.value["partition_type"], time_partitioning.value["type"])
field = try(each.value["partition_field"], time_partitioning.value["field"])
expiration_ms = try(time_partitioning.value["expiration_ms"], null)
require_partition_filter = try(time_partitioning.value["require_partition_filter"], null)
}
}
schema = file("${path.module}/resource/schema/${each.value["schema_path"]}")
}

最新更新