将数据卸载到 S3 时分配唯一的文件名



我想在使用PowerShell脚本从SnowFlake卸载文件到S3时为文件分配唯一的名称:

#get table names and assign them to $table_names
#using datdev connection to get them from datdev_db.dev_load
$current_date = Get-Date
$current_date = $current_date.ToString("yyyy-MM-dd")
$table_names = snowsql -c dwhdev_db -q "select 'COPY INTO s3://<bucket_name>/' 
|| '$current_date' || '/e1 FROM ' || table_schema || '.' || table_name 
|| ' storage_integration = s3_int file_format = (format_name = 
public.backup_to_s3_csv_format);' from 
(select table_schema, table_name from information_schema.tables where 
table_schema = '<schema_name>')"
#remove header/footer
$table_names = $table_names[5..($table_names.length-4)]
#trim pipes and spaces
$table_names = $table_names.trim("|")
$table_names = $table_names.trim()
#iterate through the array
foreach($item in $table_names){
snowsql -c dwhdev_db -q "select '$item' from dual" -o friendly=false -o header=false -o quiet=true -o remove_comments=true -o output_format=plain -o output_file=copy_<schema_name>_tables.ps1
snowsql -c dwhdev_db -f copy_<schema_name>_tables.ps1
}

运行脚本会产生以下错误:

PS C:UserssamstDocuments> ./snowsql_copy_tables.ps1
* SnowSQL * v1.2.2
Type SQL statements or !help
+---------------+-------------+--------------+
| rows_unloaded | input_bytes | output_bytes |
|---------------+-------------+--------------|
|         20002 |    19233082 |      1554279 |
+---------------+-------------+--------------+
1 Row(s) produced. Time Elapsed: 2.143s
Goodbye!
* SnowSQL * v1.2.2
Type SQL statements or !help
001030 (22000): SQL compilation error:
Files already existing at the unload destination: s3://<bucket_name>/2019-12-26/. Use overwrite option to force unloading.
001030 (22000): SQL compilation error:
Files already existing at the unload destination: s3://<bucket_name>/2019-12-26/. Use overwrite option to force unloading.
Goodbye!
* SnowSQL * v1.2.2
Type SQL statements or !help
001030 (22000): SQL compilation error:
Files already existing at the unload destination: s3://<bucket_name>/2019-12-26/. Use overwrite option to force unloading.
001030 (22000): SQL compilation error:
Files already existing at the unload destination: s3://<bucket_name>/2019-12-26/. Use overwrite option to force unloading.
001030 (22000): SQL compilation error:
Files already existing at the unload destination: s3://<bucket_name>/2019-12-26/. Use overwrite option to force unloading.
Goodbye!
* SnowSQL * v1.2.2
Type SQL statements or !help

如何修改该 PowerShell 脚本以在每次运行时输出唯一的文件名?

这完全取决于您希望如何区分文件。如果希望每次运行都使用不同的文件名,可以使用以下代码片段(参考(为要上传的每个文件生成 GUID:

# Create a new GUID
$Id = [GUID]::NewGuid()

这取决于您希望如何将其与文件名约定集成...但它可能看起来像这样:


#iterate through the array
foreach($item in $table_names){
$unique_name = [GUID]::NewGuid()
snowsql -c dwhdev_db -q "select '$item' from dual" -o friendly=false -o header=false -o quiet=true -o remove_comments=true -o output_format=plain -o output_file=${unique_name}.ps1
snowsql -c dwhdev_db -f ${unique_name}.ps1
}

或者,如果您使用更长的时间戳,精确到您不会预见脚本在增量中运行两次,则可以使用它来区分文件夹/文件名。

最新更新