r-如何使用Python从AWS Redshift卸载数据表并保存到s3 bucket中(附示例)



我正在尝试从AWS红移表中提取数据,并使用Python将其保存到s3 bucket中。我在R中也做了同样的操作,但我想在Python中复制同样的操作。这是我正在使用的代码

R

drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
host = "xyz.amazonaws.com",
port = "abcd",
user = "a",
password = "b",
dbname = "DB")

dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
from xyz 
where user_name in (''ythm'')
and customer=''RANDOM'' 
and utc_date between ''2021-10-01'' and ''2022-01-21'' 

')

TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF;")


dbDisconnect(connection)

我已经能够用下面的脚本连接到aws整形Db

Python

import psycopg2
import pandas as pd
connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")

我正在尝试创建一个表并保存到s3 bucket中,关于如何在Python上实现这一点,有什么建议吗?

创建连接后,您只需运行相同的UNLOAD查询。

执行SQL语句的方式是创建一个游标并运行"execute"方法(https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):

sql = """UNLOAD ('select COL1,COL2,COL3
from xyz 
where user_name in (''ythm'')
and customer=''RANDOM'' 
and utc_date between ''2021-10-01'' and ''2022-01-21'' 

')

TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF"""
cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()

最新更新