即使从不同的文件格式加载,Vertica如何处理半结构化数据



我对Vertica中半结构化数据处理的理解是,如果数据是这样说的(在json中)

{
"f1":1,
"f2":"hello",
"f3":false,
"f4":2
}

则创建包含两列__identity____raw__flextable__identify__将有4个字段(我假设是整数1,2,3,4),__raw__将是数据(1,hello,false和2)的原始表示。

我也可以在同一个灵活的csv文件中加载数据,例如2, hello2, true, 3Vertica如何决定哪个字段映射到哪个列(例如:f1f4)都是int

好吧,没有什么比拥有Vertica SQL提示符(以及创建数据库对象的特权…)更好的了。

对于JSON,字段名在结构中:键值对。

对于CSV,数据文件的第一行需要有列名-我在下面添加…

-- connecting with VSQL, 
$ vsql -h localhost -d sbx -U dbadmin -w pwd
$ vsql -h localhost -d sbx -U dbadmin -w pwd
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type:  h or ? for help with vsql commands
g or terminate with semicolon to execute query
q to quit
sbx=> -- create the flex table
sbx=> CREATE FLEX TABLE flx();
CREATE TABLE
sbx=> -- load the flex table from stdin - data handed in-line - using your input
sbx=> COPY flx FROM stdin PARSER fjsonparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> {                                        
>> "f1":1,
>> "f2":"hello",
>> "f3":false,
>> "f4":2
>> }
>> .
-- test the load ...
sbx=> SELECT f1,f2,f3,f4 FROM flx;
f1 |  f2   |  f3   | f4 
----+-------+-------+----
1  | hello | false | 2
sbx=>-- load the CSV file - note that we need the title line, 
sbx=>-- which I add, to have same values in the same fields
sbx=> COPY flx FROM stdin PARSER fcsvparser();
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> f1,f2,f3,f4
>> 2, hello2, true, 3
>> .
sbx=>-- check the contents now
sbx=> SELECT f1,f2,f3,f4 FROM flx;
f1 |   f2   |  f3   | f4 
----+--------+-------+----
1  | hello  | false | 2
2  | hello2 | true  | 3
sbx=>-- resulting table definition in catalog ...
sbx=> d flx
List of Fields by Tables
Schema  | Table |    Column    |          Type          |  Size  | Default | Not Null | Primary Key | Foreign Key 
---------+-------+--------------+------------------------+--------+---------+----------+-------------+-------------
dbadmin | flx   | __identity__ | int                    |      8 |         | t        | f           | 
dbadmin | flx   | __raw__      | long varbinary(130000) | 130000 |         | t        | f           | 
(2 rows)
sbx=> -- check the contents of __identity__ and (after visualising) __raw__
sbx=> SELECT __identity__,REPLACE(MAPTOSTRING(__raw__),CHR(10),' ') FROM flx;
__identity__ |                                REPLACE                                 
--------------+------------------------------------------------------------------------
1 | {     "f1": "1",     "f2": "hello",     "f3": "false",     "f4": "2" }
2 | {     "f1": "2",     "f2": "hello2",     "f3": "true",     "f4": "3" }

最新更新