在 ClickHouse 中填充物化视图超出了内存限制



我正在尝试使用ReplicatedAggregatingMergeTree引擎在使用ReplicatedMergeTree引擎的表上创建一个物化视图。

几百万行后,我得到DB::Exception: Memory limit (for query) exceeded.有没有办法解决这个问题?

CREATE MATERIALIZED VIEW IF NOT EXISTS shared.aggregated_calls_1h
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/shared/aggregated_calls_1h', '{replica}')
PARTITION BY toRelativeDayNum(retained_until_date)
ORDER BY (
client_id,
t,
is_synthetic,
source_application_ids,
source_service_id,
source_endpoint_id,
destination_application_ids,
destination_service_id,
destination_endpoint_id,
boundary_application_ids,
process_snapshot_id,
docker_snapshot_id,
host_snapshot_id,
cluster_snapshot_id,
http_status
)
SETTINGS index_granularity = 8192
POPULATE
AS
SELECT
client_id,
toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
date,
toDate(retained_until_timestamp / 1000) retained_until_date,
is_synthetic,
source_application_ids,
source_service_id,
source_endpoint_id,
destination_application_ids,
destination_service_id,
destination_endpoint_id,
boundary_application_ids,
http_status,
process_snapshot_id,
docker_snapshot_id,
host_snapshot_id,
cluster_snapshot_id,
any(destination_endpoint) AS destination_endpoint,
any(destination_endpoint_type) AS destination_endpoint_type,
groupUniqArrayArrayState(destination_technologies) AS destination_technologies_state,
minState(ingestion_time) AS min_ingestion_time_state,
sumState(batchCount) AS sum_call_count_state,
sumState(errorCount) AS sum_error_count_state,
sumState(duration) AS sum_duration_state,
minState(toUInt64(ceil(duration/batchCount))) AS min_duration_state,
maxState(toUInt64(ceil(duration/batchCount))) AS max_duration_state,
quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p25_state,
quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p50_state,
quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p75_state,
quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p90_state,
quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p95_state,
quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p98_state,
quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)), batchCount) AS latency_p99_state,
quantileTimingWeightedState(0.25)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p25_large_state,
quantileTimingWeightedState(0.50)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p50_large_state,
quantileTimingWeightedState(0.75)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p75_large_state,
quantileTimingWeightedState(0.90)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p90_large_state,
quantileTimingWeightedState(0.95)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p95_large_state,
quantileTimingWeightedState(0.98)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p98_large_state,
quantileTimingWeightedState(0.99)(toUInt64(ceil(duration/batchCount)/100), batchCount) AS latency_p99_large_state,
sumState(minSelfTime) AS sum_min_self_time_state
FROM shared.calls_v2
WHERE sample_type != 'user_selected'
GROUP BY
client_id,
t,
date,
retained_until_date,
is_synthetic,
source_application_ids,
source_service_id,
source_endpoint_id,
destination_application_ids,
destination_service_id,
destination_endpoint_id,
boundary_application_ids,
process_snapshot_id,
docker_snapshot_id,
host_snapshot_id,
cluster_snapshot_id,
http_status
HAVING destination_endpoint_type != 'INTERNAL'

您可以尝试使用clickhouse-client选项--max_memory_usage增加限制。

-

-max_memory_usage arg"处理单个查询的最大内存使用量。零意味着无限。

https://clickhouse.yandex/docs/en/operations/settings/query_complexity/#settings_max_memory_usage

或者,而不是填充只是将数据手动复制到表中

INSERT INTO .inner.shared.aggregated_calls_1h
SELECT 
client_id,
toUInt64(floor(t / (60000 * 60)) * (60000 *60)) AS t,
...

最新更新