我的avro文件包含以下列:
{"name":"my_column","type":["null",{"type":"array","items":{"type":"record","name":"my_column","namespace":"v11","fields":[{"name":"my_column","type":["null","int"],"default":null}]}}],"default":null}
我将数据加载到Vertica中并存储为VARBINARY
。例子:
db=> select MapToString(my_column) from tab limit 1;
MapToString
------------------------------------------------------------------------------------------------------------------------------------------------------
{
"0.__name__": "my_column",
"0.my_column": "5",
"1.__name__": "my_column",
"1.my_column": "9"
}
(1 row)
数据实际上可以简化为ARRAY[INT]
。(即ARRAY[5,9]
).
执行这个转换的正确方法是什么?通过UDTF或UDParser扩展Vertica ?通过SQL执行此转换?别的吗?
编辑:我要检查标量UDF是否可以嵌入到AVROPARSER旁边的COPY命令中,或者它是否需要额外的ETL。
谢谢!
尝试物化列-好吧,您需要知道在flex表中期望什么…
我有这个表:
SELECT
__id__
, REGEXP_REPLACE(MAPTOSTRING(__raw__),'s+',' ') AS rawasstring
FROM a;
-- out __id__ | rawasstring
-- out --------+--------------------------------------------------------------------------------------------------
-- out | { "0.__name__": "my_column", "0.my_column": "5", "1.__name__": "my_column", "1.my_column": "9" }
-- out (1 row)
然后,我只是添加了一个物化列,像这样:
ALTER TABLE a ADD int_array ARRAY[int,10]
DEFAULT ARRAY[
MAPLOOKUP(__raw__, '0.my_column')
, MAPLOOKUP(__raw__, '1.my_column')
, MAPLOOKUP(__raw__, '2.my_column')
, MAPLOOKUP(__raw__, '3.my_column')
, MAPLOOKUP(__raw__, '4.my_column')
, MAPLOOKUP(__raw__, '5.my_column')
, MAPLOOKUP(__raw__, '6.my_column')
, MAPLOOKUP(__raw__, '7.my_column')
, MAPLOOKUP(__raw__, '8.my_column')
, MAPLOOKUP(__raw__, '9.my_column')
]::ARRAY[INT,10];
Map中不存在的键为NULL,并且没有真正添加到数组中。现在我有了数组:
SELECT
__id__::VARCHAR
, REGEXP_REPLACE(MAPTOSTRING(__raw__),'s+',' ') AS rawasstring
, int_array
FROM a ;
-- out __id__ | rawasstring | int_array
-- out --------+--------------------------------------------------------------------------------------------------+-----------
-- out | { "0.__name__": "my_column", "0.my_column": "5", "1.__name__": "my_column", "1.my_column": "9" } | [5,9]
-- out (1 row)