Pulumi和GCP Uptime检查部署不时失败



我们最近在我们的pulumi堆栈中添加了GCP UptimeChecks,我们创建了这样的正常运行时间检查

ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
DisplayName: pulumi.String("uptime check example"),
HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
Path:          pulumi.String(fmt.Sprintf("/%s/status", "github")),
Port:          pulumi.Int(443),
RequestMethod: pulumi.String("GET"),
UseSsl:        pulumi.Bool(true),
ValidateSsl:   pulumi.Bool(true),
},
MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
Labels: pulumi.StringMap{
"host": pulumi.String(targetUrl),
},
Type: pulumi.String("uptime_url"),
},
Period:  pulumi.String("60s"),
Timeout: pulumi.String("10s"),
})

然后我决定为这个正常运行时间检查添加一个警报策略

注意:这里我们转发之前创建的正常运行时间检查

args := monitoring.AlertPolicyArgs{
DisplayName: pulumi.String(name),
Combiner:    pulumi.String("AND"),
Conditions: monitoring.AlertPolicyConditionArray{
monitoring.AlertPolicyConditionArgs{
DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
Filter:   pulumi.Sprintf("metric.type="monitoring.googleapis.com/uptime_check/check_passed" AND metric.label.check_id="%s" AND resource.type="uptime_url"", uptimeCheck.UptimeCheckId),
Duration: pulumi.String("60s"),
Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
Count: pulumi.IntPtr(1),
},
ThresholdValue: pulumi.Float64Ptr(1),
Comparison:     pulumi.String("COMPARISON_LT"),
Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
AlignmentPeriod:  pulumi.String("60s"),
PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
},
},
},
},
}
NotificationChannels: "alerts", 

这在第一次部署中工作得很好,但随后的部署开始失败。

error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.

我注意到的是新的正常运行时间检查在我们的帐户中被创建,但是GCP进入了一些奇怪的状态,它无法删除以前的正常运行时间检查,我设法修复堆栈的唯一方法是手动删除旧的正常运行时间检查。

有人经历过吗?

刚刚经历了同样的事情。但是,我不需要删除,我也可以稍微修改和保存检查。后来一切都好了。

您没有在monitoredResource.labels中指定投影正常运行时间检查监视的资源需要该类型期望的所有标签你已经使用了uptime_url,所以…https://cloud.google.com/monitoring/api/resources tag_uptime_url

最新更新