PostgreSQL:一些麻烦插入从select与on冲突



我有一些Postgres表:

CREATE TABLE source_redshift.staticprompts (
id              INT,
projectid       BIGINT,
scriptid        INT,
promptnum       INT,
prompttype      VARCHAR(20),
inputs          VARCHAR(2000),
attributes      VARCHAR(2000),
text            VARCHAR(2000),
corpuscode      VARCHAR(2000),
comment         VARCHAR(2000),
created         TIMESTAMP,
modified        TIMESTAMP

);
and 
CREATE TABLE target_redshift.user_input_conf (
collect_project_id      BIGINT NOT NULL UNIQUE,
prompt_type             VARCHAR(20),
prompt_input_desc       VARCHAR(300),
prompt_input_name       VARCHAR(100),
no_of_prompt_count      BIGINT,
prompt_input_value      VARCHAR(100) UNIQUE,
prompt_input_value_id   BIGSERIAL PRIMARY KEY,
script_id               BIGINT,
corpuscode              VARCHAR(20),
min_recordings          VARCHAR(2000),
max_recordings          VARCHAR(2000),
recordings_count        VARCHAR(2000),
lease_duration          VARCHAR(2000),
date_created            TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated            TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_different UNIQUE (prompt_input_value,collect_project_id)
);

我需要复制数据从静态提示到user_input_conf与以下规则:

Primary Key : prompt_input_value_id
Unique Values : collect_project_id, prompt_input_value
Data Load Logic :
Insert only when new prompt input value is found for given collect project from source. Inputs column stores the values in JSON format in staticprompts table.
Insert :
Generate unique sequence number for each of the new prompt input value for a collect project id from source and store in prompt_input_value_id.
Update :
If prompt value already exists for a collect project and if there are any value changes on prompt_input_desc or prompt input name or prompt input value then update only those columns.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
prompt_input_value - Inputs.value is stored in the inputs column as JSON text. Create a unique record for each inputs.value. Look at the example below this table.

我尝试使用这个查询:

INSERT INTO target_redshift.user_input_conf AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
s.inputs::jsonb#>>'{inputs,0,desc}' AS desc,
s.inputs::jsonb#>>'{inputs,0,name}' AS name,
s.inputs::jsonb#>>'{inputs,0,values}' AS values,
s.scriptid,
s.corpuscode
FROM source_redshift.staticprompts AS s
ON CONFLICT (collect_project_id, prompt_input_value)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, prompt_input_value, date_updated) =
(EXCLUDED.prompt_input_desc, EXCLUDED.prompt_input_name, EXCLUDED.prompt_input_value, NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
OR t.prompt_input_value != EXCLUDED.prompt_input_value;
""")

但是我得到一个错误:

psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "user_input_conf_collect_project_id_key"
DETAIL:  Key (collect_project_id)=(1) already exists.

我认为这是一个误解。两列上的唯一约束并不意味着每列都是唯一的,而是两列的组合是唯一的。

所以你的must_be_differentprompt_input_valuecollect_project_id的唯一约束不同(并且更弱)。例如,如果您有三行

collect_project_id | prompt_input_value
--------------------+--------------------
1 | a
1 | b
2 | b

它们将与单列唯一约束产生冲突,但不会与must_be_different产生冲突。

我猜潜在的问题是你想使用INSERT ... ON CONFLICT与多个唯一的约束。这是不可能的;参考这个问题的讨论和可能的解决方案。

最新更新