我有一个ETL脚本,它将表的列类型更改为所需类型并更新数据。剧本很完美,但我想知道是否有更好的&更快的方法?在Redshift工作。
当前程序;
CREATE or REPLACE PROCEDURE p_alter_staging_tbls() AS $$
DECLARE
row RECORD;
BEGIN
FOR row IN select * from
(
select distinct table_name, column_name,data_type from staging.staging_col_info_v a order by a.table_name asc
)
loop
EXECUTE 'ALTER TABLE staging.' || row.table_name || ' ' || 'ADD COLUMN ' || concat('new_',row.column_name) || ' ' || row.data_type ;
EXECUTE 'UPDATE staging.' || row.table_name || ' ' || 'SET ' || concat('new_',row.column_name) || ' ' || '=' || ' ' || row.column_name || '::' || row.data_type ;
EXECUTE 'ALTER TABLE staging.' || row.table_name || ' ' || 'DROP COLUMN ' || row.column_name ;
execute 'ALTER TABLE staging.' || row.table_name || ' ' || 'RENAME COLUMN '|| concat('new_',row.column_name) || ' ' || 'TO ' || row.column_name;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
对于优化,您可以检查数据类型是否需要更改,或者您是否已经拥有正确的类型和大小。
为此,您可以使用以下查询:
SELECT table_schema, table_name, column_name, data_type,
column_default, character_maximum_length, numeric_precision
FROM information_schema.columns
WHERE table_schema = 'staging'
AND table_name = row.table_name
AND column_name = row.column_name
然后检查是否";data_type";"与";numeric_precision";不同,在这种情况下,您可以更改列。
您也可以考虑不创建新列,但可以使用:
ALTER TABLE table_name
ALTER COLUMN column_name [SET DATA] TYPE new_data_type;
而不是:
EXECUTE 'ALTER TABLE staging.' || row.table_name || ' ' || 'ADD COLUMN ' || concat('new_',row.column_name) || ' ' || row.data_type ;
EXECUTE 'UPDATE staging.' || row.table_name || ' ' || 'SET ' || concat('new_',row.column_name) || ' ' || '=' || ' ' || row.column_name || '::' || row.data_type ;
EXECUTE 'ALTER TABLE staging.' || row.table_name || ' ' || 'DROP COLUMN ' || row.column_name ;
execute 'ALTER TABLE staging.' || row.table_name || ' ' || 'RENAME COLUMN '|| concat('new_',row.column_name) || ' ' || 'TO ' || row.column_name;
你可以使用这个:
EXECUTE 'ALTER TABLE staging.' || row.table_name || ' ' || 'ALTER COLUMN '|| row.column_name || ' ' || 'TYPE ' || || row.data_type;