将引用的数字从CSV加载到雪花表中,并复制到<表>



我在将CSV数据加载到雪花表中时遇到问题。字段用双引号括起来,因此将它们导入表中存在问题。

我知道 COPY TO 具有 CSV 特定的选项 FIELD_OPTIONALLY_ENCLOSED_BY = '"'但它根本不起作用。

以下是表定义和复制命令的一些图片:

CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);

COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV 
STRIP_NULL_VALUES = TRUE 
FIELD_DELIMITER = ',' 
SKIP_HEADER = 1  
error_on_column_count_mismatch=false 
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;

csv 文件如下所示:

"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"

我收到错误

'''Numeric value '"3922000"' is not recognized '''

我很确定这是因为当雪花读取"标记时,NUMBER 值被解释为字符串,但由于我使用

FIELD_OPTIONALLY_ENCLOSED_BY = '"' 

它甚至不应该在那里...有人对此有所解决方案吗?

也许您的文件不正确? 我只能毫无问题地运行以下内容。

1. create the test table:
CREATE OR REPLACE TABLE 
dbNameHere.schemaNameHere.stacko_58322339 (
num1    NUMBER,  
num2    NUMBER, 
num3    NUMBER);
2. create test file, contents as follows 
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage 
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM @stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV 
STRIP_NULL_VALUES = TRUE 
FIELD_DELIMITER = ',' 
SKIP_HEADER = 0  
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE 
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results 
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file                                                | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED |           4 |           4 |           4 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+                                                  
|    NUM1 |     NUM2 |  NUM3 |
|---------+----------+-------|
|       1 |        2 |     3 |
| 3922000 | 14733370 | 57256 |
|       3 |        2 |     1 |
|       4 |        5 |     6 |
+---------+----------+-------+

你能尝试用类似的测试吗?

编辑:快速浏览一下您的数据,显示您的许多数值字段似乎以逗号开头,因此数据肯定有问题。

假设您的数字是欧洲格式,小数位,并且.千,阅读数字格式帮助,似乎 Snowflake 不支持将其作为输入。我会打开一个功能请求。

但是,如果您按text阅读该列,请使用 REPLACE like

SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;

给:

A         B         C
100,1234  100.1234  100.1234000000

更安全的做法是先剥离占位符,例如

SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;

相关内容

  • 没有找到相关文章

最新更新