给定下表:
CREATE TABLE event_partitioned (
customer_id varchar(50) NOT NULL,
user_id varchar(50) NOT NULL,
event_id varchar(50) NOT NULL,
comment varchar(50) NOT NULL,
event_timestamp timestamp with time zone DEFAULT NOW()
)
PARTITION BY RANGE (event_timestamp);
按日历周划分[一个例子]:
CREATE TABLE event_partitioned_2020_51 PARTITION OF event_partitioned
FOR VALUES FROM ('2020-12-14') TO ('2020-12-20');
以及唯一约束[event_timestamp,这是分区密钥以来必需的]:
ALTER TABLE event_partitioned
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
如果customer_id、user_id、event_id存在,我想更新,否则插入:
INSERT INTO event_partitioned (customer_id, user_id, event_id)
VALUES ('9', '99', '999')
ON CONFLICT (customer_id, user_id, event_id, event_timestamp) DO UPDATE
SET comment = 'I got updated';
但我不能只为customer_id、user_id、event_id添加唯一约束,因此也不能为event_timestamp添加唯一约束。
因此,这将插入customer_id、user_id和event_id的重复项。即使添加now((作为第四个值也是如此,除非现在((与event_timestamp中已经存在的内容精确匹配。
有没有一种方法可以让ON CONFLICT在这里不那么"精细",并在now((在分区的一周内更新,而不是精确地在"2020-12-14 09:13:04.543256"上更新?
基本上,我试图避免customer_id、user_id和event_id的重复,至少在一周内是这样,但仍然可以从按周分区中受益(这样数据检索可以缩小到一个日期范围,而不是扫描整个分区表(。
我认为在分区表中使用on conflict
无法做到这一点。然而,您可以用CTE来表达逻辑:
with
data as ( -- data
select '9' as customer_id, '99' as user_id, '999' as event_id
),
ins as ( -- insert if not exists
insert into event_partitioned (customer_id, user_id, event_id)
select * from data d
where not exists (
select 1
from event_partitioned ep
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
)
returning *
)
update event_partitioned ep -- update if insert did not happen
set comment = 'I got updated'
from data d
where
ep.customer_id = d.customer_id
and ep.user_id = d.user_id
and ep.event_id = d.event_id
and not exists (select 1 from ins)
@GMB的回答很好,效果很好。既然对按时间范围分区的分区表(父表(强制执行唯一约束通常没有那么有用,为什么现在只在分区本身上放置唯一约束/索引呢?
在您的情况下,event_partitioned_2020_51可能有一个唯一的约束:
ALTER TABLE event_partitioned_2020_51
ADD UNIQUE (customer_id, user_id, event_id, event_timestamp);
后续查询可以使用
INSERT ... INTO event_partitioned_2020_51 ON CONFLICT (customer_id, user_id, event_id, event_timestamp)
只要这是预期的分区,通常就是这样。