我有ORC文件,文件中有double
数据类型的列,这些列在AWS Athena中可查询为数字(18,0(。这是我能在目标Redshift数据类型的字节长度上找到的最好的数据:https://docs.aws.amazon.com/redshift/latest/dg/r_Numeric_types201.html.我试过float4和float8,但都没用。
ERROR: Spectrum Scan Error Detail:
-----------------------------------------------
error: Spectrum Scan Error code: 15007 context:
In file https://s3.us-east-1.amazonaws.com/....zlib.orc declared column type DECIMAL for column <test_column> incompatible with
ORC file column type double query: 40933 location: dory_util.cpp:1167 process: worker_thread [pid=1299]
-----------------------------------------------
[ErrorId: 1-6233d72e-4401a9ae4a9f92432ebc9fcf]
表模式
CREATE TABLE "schema"."table" (
col1 float,
col2 decimal(18,0) encode az64, # FAILS source ORC - double
col3 float4, # FAILS source ORC - double
# col4 numeric(18,0) encode az64, # AWS Glue representation source ORC - double
col5 character varying(256) encode lzo
);
失败的代码:
COPY "schema"."table"
FROM 's3://.../database/table/' IAM_ROLE 'arn:aws:iam::123456789:role/TestIAM'
FORMAT AS ORC
不使用十进制数据类型,而是使用双精度,因为它是用于十进制或浮点值的Redshift的标准数据类型