我的目标是摄取数据,按特定列排序,以便分区也按该顺序排列,以使该列上的修剪高效。
我想最大限度地降低排序成本,并希望获得一些关于我应该多久重新聚类一次的指导。
例如:
CREATE TABLE test_order(n NUMBER, s STRING);
INSERT INTO test_order
VALUES
(12, 'a'),
(11, 'b'),
(10, 'c'),
(9, 'd'),
(8, 'e'),
(7, 'f'),
(6, 'g'),
(5, 'h'),
(6, 'i'),
(5, 'j'),
(4, 'k'),
(3, 'l'),
(2, 'm'),
(1, 'n');
SELECT * FROM test_order
ORDER BY n ASC;
ALTER TABLE test_order CLUSTER BY (n, s);
ALTER TABLE test_order RECLUSTER;
SELECT n, s FROM test_order;
SELECT SYSTEM$CLUSTERING_INFORMATION('test_order', '(n,s)');
这是第一个插入的信息:
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 1,
"total_constant_partition_count" : 0,
"average_overlaps" : 0.0,
"average_depth" : 1.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 1,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
这是第二个插入页上的信息:
INSERT INTO test_order
VALUES
(12, 'p'),
(11, 'f'),
(10, 'z'),
(9, 'y'),
(8, 'x'),
(7, 'w'),
(6, 'v'),
(5, 'u'),
(6, 't'),
(5, 's'),
(4, 'r'),
(3, 'q'),
(2, 'p'),
(1, 'o');
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 2,
"total_constant_partition_count" : 0,
"average_overlaps" : 1.0,
"average_depth" : 2.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 2,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
然后在第二次重新聚类后:
{
"cluster_by_keys" : "LINEAR(N, S)",
"total_partition_count" : 2,
"total_constant_partition_count" : 0,
"average_overlaps" : 1.0,
"average_depth" : 2.0,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 2,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0
}
}
抱歉,我是格式化新手,但在插入特定顺序后,聚类比率没有太大变化 - 这是因为我的数据集示例太小,还是顺序对集群性能无关紧要?
如果您要摄取排序的数据,我认为您不需要对表进行聚类。 您的数据将自然聚类,您将获得所需的修剪。