Vertica数据库添加了具有相同主键的重复条目

  • 本文关键字:添加 数据库 Vertica vertica
  • 更新时间 :
  • 英文 :


我正在windows上运行Vertica的docker映像。我用这个模式在vertica中创建了一个表(student_idprimary key)

dbadmin@d1f942c8c1e0(*)=> d testschema.student;
List of Fields by Tables
Schema   |  Table  |   Column   |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
------------+---------+------------+-------------+------+---------+----------+-------------+-------------
testschema | student | student_id | int         |    8 |         | t        | t           |
testschema | student | name       | varchar(20) |   20 |         | f        | f           |
testschema | student | major      | varchar(20) |   20 |         | f        | f           |
(3 rows)

student_id是主键。我正在使用copy命令测试从csv文件加载数据。

首先我使用insert-insert into testschema.student values (1,'Jack','Biology');

然后在/home/dbadmin/vertica_test目录下创建了一个csv文件-

vi student.csv 
2,Kate,Sociology
3,Claire,English
4,Jack,Biology
5,Mike,Comp. Sci

然后我运行这个命令

copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;

I test the result

select * from testschema.student - shows 5 rows
select * from students_rejected; - no rows

然后我在/home/dbadmin/vertica_test directory创建了另一个csv文件,其中包含坏数据

vi student_bad.csv
bad_data_type_for_student_id,UnaddedStudent, UnaddedSubject
6,Cassey,Physical Education

我添加了坏csv文件中的数据

copy testschema.students from '/home/dbadmin/vertica_test/student.csv' delimiter ',' rejected data as table students_rejected;

然后测试输出

select * from testschema.student - shows 6 rows <-- only one row got added. all ok
select * from students_rejected; - shows 1 row  <-- bad row's entry is here. all ok

一切都很好

然后我再次添加坏数据,没有拒绝数据选项

copy testschema.students from '/home/dbadmin/vertica_test/student_bad.csv' delimiter ',' ;

但是现在学生id为6的条目又被添加了!!

student_id |  name  |       major
------------+--------+--------------------
1 | Jack   | Biology
2 | Kate   | Sociology
3 | Claire | English
4 | Jack   | Biology
5 | Mike   | Comp. Sci
6 | Cassey | Physical Education <--
6 | Cassey | Physical Education <--

这不应该被拒绝吗?

如果您使用以下命令创建students:

DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int 
, name       varchar(20)
, major      varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id)                                                                                                                                         
);

即,没有显式关键字ENABLED,则禁用主键约束。也就是说,您可以愉快地插入重复项,但是如果以后想通过主键列连接到students表,则会遇到错误。

启用主键约束后…

[...]
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED                                                                                                                                         
[...]

我认为你达到了预期的效果。

整个场景:

DROP TABLE IF EXISTS students;
CREATE TABLE students (
student_id int
, name       varchar(20)
, major      varchar(20)
, CONSTRAINT pk_students PRIMARY KEY(student_id) ENABLED
);
INSERT INTO students
SELECT 1,'Jack'  ,'Biology'
UNION ALL SELECT 2,'Kate'  ,'Sociology'
UNION ALL SELECT 3,'Claire','English'
UNION ALL SELECT 4,'Jack'  ,'Biology'
UNION ALL SELECT 5,'Mike'  ,'Comp. Sci'
UNION ALL SELECT 6,'Cassey','Physical Education'
;
-- out  OUTPUT 
-- out --------
-- out       6
COMMIT;
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
.
-- out vsql:/home/gessnerm/._vfv.sql:4: ERROR 6745:  
-- out  Duplicate key values: 'student_id=6' 
-- out  -- violates constraint 'dbadmin.students.pk_students'
SELECT * FROM students;
-- out  student_id |  name  |       major        
-- out ------------+--------+--------------------
-- out           1 | Jack   | Biology
-- out           2 | Kate   | Sociology
-- out           3 | Claire | English
-- out           4 | Jack   | Biology
-- out           5 | Mike   | Comp. Sci
-- out           6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out  node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason 
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)

唯一可靠的检查似乎是ANALYZE_CONSTRAINTS()调用…

ALTER TABLE students ALTER CONSTRAINT pk_students DISABLED;
-- out Time: First fetch (0 rows): 7.618 ms. All rows formatted: 7.632 ms
COPY students FROM STDIN DELIMITER ','
REJECTED DATA AS TABLE students_rejected;
6,Cassey,Physical Education
.
-- out Time: First fetch (0 rows): 31.790 ms. All rows formatted: 31.791 ms
SELECT * FROM students;
-- out  student_id |  name  |       major        
-- out ------------+--------+--------------------
-- out           1 | Jack   | Biology
-- out           2 | Kate   | Sociology
-- out           3 | Claire | English
-- out           4 | Jack   | Biology
-- out           5 | Mike   | Comp. Sci
-- out           6 | Cassey | Physical Education
-- out           6 | Cassey | Physical Education
SELECT * FROM students_rejected;
-- out  node_name | file_name | session_id | transaction_id | statement_id | batch_number | row_number | rejected_data | rejected_data_orig_length | rejected_reason 
-- out -----------+-----------+------------+----------------+--------------+--------------+------------+---------------+---------------------------+-----------------
-- out (0 rows)
SELECT ANALYZE_CONSTRAINTS('students');
-- out  Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values 
-- out -------------+------------+--------------+-----------------+-----------------+---------------
-- out  dbadmin     | students   | student_id   | pk_students     | PRIMARY         | ('6')
-- out (1 row)

相关内容

  • 没有找到相关文章

最新更新