我有一个每天更新的数据源,我需要将它插入到一个表中。我想更新行与相同的id,但不同的数据,添加不存在的数据和删除行不存在了。我在最后一个任务中遇到了问题,我似乎无法从以前的运行中删除不存在于新提要中的行。
这就是我的列表"表与一些测试数据:
INSERT INTO public.listings(
id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 100, '2021-05-03', 3, 'mf', '2021-01-01'),
(2, 200, '2021-05-03', 4, 'pmf', '2021-01-01'),
(3, 300, '2021-05-04', 5, 'jorge', '2021-01-01'),
(4, 400, '2021-05-05', 3, 'yuyu', '2021-01-01'),
(5, 500, '2021-05-06', 3, 'papo', '2021-01-01');
一个新的插入子句(更新到提要):
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
ON CONFLICT (id) -- simple clause
DO UPDATE
SET pub_date = EXCLUDED.pub_date
, price = EXCLUDED.price
, rooms = EXCLUDED.rooms
, updated_date = EXCLUDED.updated_date
WHERE (p.pub_date, p.price, p.rooms, p.updated_date) IS DISTINCT FROM
(EXCLUDED.pub_date, EXCLUDED.price, EXCLUDED.rooms, EXCLUDED.updated_date);
触发点:
CREATE OR REPLACE FUNCTION trg_prop_backup()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO old_listings
VALUES (OLD.*);
DELETE FROM listings --this sentence generates an error on id duplicity
WHERE listings.id = OLD.id;
RETURN NEW;
END
$func$;
CREATE TRIGGER prop_listings_upd_bef
BEFORE UPDATE OR DELETE ON listings
FOR EACH ROW EXECUTE FUNCTION trg_prop_backup();
这已经可以更新现有的行并插入新的行,但它不适用于更新提要上不再存在的行。我做错了什么(或没做什么),但我似乎想不出解决办法。
"old_listings"表中包含新提要中不存在的行,因为数据已被更新或删除。
运行两个插入查询的结果必须是(对于清单表):
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
和old_listings表:
(1, 100, '2021-05-03', 3, 'mf', '2021-01-01'),
(3, 300, '2021-05-04', 5, 'jorge', '2021-01-01'),
(4, 400, '2021-05-05', 3, 'yuyu', '2021-01-01'),
(5, 500, '2021-05-06', 3, 'papo', '2021-01-01');
我的建议是:
- 在表
listings
中添加额外的时间戳字段series_ts
,并在插入或更新时将其值设置为current_timestamp
; - 删除
insert on conflict update
查询的where
子句,以便series_ts
即使对于相同的记录也会更新; - 删除与insert查询在同一事务中
series_ts
不等于current_timestamp
的所有listings
记录。
首先为执行这个函数的每一行创建一个'before insert or update '触发器:
create function set_listings_series_ts() returns trigger language plpgsql as $$
begin
new.series_ts := current_timestamp;
return new;
end; $$;
设置好后,运行:
BEGIN TRANSACTION;
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
ON CONFLICT (id) -- simple clause
DO UPDATE
SET pub_date = EXCLUDED.pub_date
, price = EXCLUDED.price
, rooms = EXCLUDED.rooms
, updated_date = EXCLUDED.updated_date; -- your query w/o WHERE
WITH t as
(
delete from listings
where series_ts < current_timestamp
returning *
)
insert into old_listings select * from t; -- delete old data, save in old_listings
COMMIT TRANSACTION;
你也可以这样做,如果你改变'insert…更新查询。
我的建议:
首先删除表listings
中的所有行,然后在表old_listings
中创建相应的行:
WITH list AS
( DELETE FROM listings
WHERE true
RETURNING id, price, pub_date, rooms, publisher, updated_date
)
INSERT INTO old_listings (id, price, pub_date, rooms, publisher, updated_date)
SELECT id, price, pub_date, rooms, publisher, updated_date
FROM list ;
然后在表listings
中插入新的行,并通过触发器删除表old_listing
中的相应行:
INSERT INTO listings AS p (id, price, pub_date, rooms, publisher, updated_date)
VALUES
(1, 125, '2021-04-30', 4, 'mf', '2021-10-30')
, (2, 200, '2021-05-03', 4, 'pmf', '2021-01-01')
, (6, 250, '2021-04-30', 3, 'mf', '2021-10-30')
CREATE OR REPLACE FUNCTION trg_del_bacckup()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
DELETE FROM old_listings
WHERE id = NEW.id;
RETURN NEW;
END
$func$;
CREATE TRIGGER trg_del_bacckup
AFTER INSERT ON listings
FOR EACH ROW EXECUTE FUNCTION trg_del_bacckup();
PS:您还必须管理表old_listings
中的冲突,当连续提要试图插入具有已经存在的id的行