我正在尝试从AWS红移表中提取数据,并使用Python将其保存到s3 bucket中。我在R中也做了同样的操作,但我想在Python中复制同样的操作。这是我正在使用的代码
R
drv <- dbDriver("PostgreSQL")
connection <- dbConnect(drv,
host = "xyz.amazonaws.com",
port = "abcd",
user = "a",
password = "b",
dbname = "DB")
dbGetQuery(connection, "UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF;")
dbDisconnect(connection)
我已经能够用下面的脚本连接到aws整形Db
Python
import psycopg2
import pandas as pd
connection=psycopg2.connect(
host="xyz.amazonaws.com",
port = "abcd",
database="DB",
user="a",
password="b")
我正在尝试创建一个表并保存到s3 bucket中,关于如何在Python上实现这一点,有什么建议吗?
创建连接后,您只需运行相同的UNLOAD查询。
执行SQL语句的方式是创建一个游标并运行"execute"方法(https://www.psycopg.org/docs/cursor.html?highlight=execute#cursor.execute):
sql = """UNLOAD ('select COL1,COL2,COL3
from xyz
where user_name in (''ythm'')
and customer=''RANDOM''
and utc_date between ''2021-10-01'' and ''2022-01-21''
')
TO 's3://MYBUCKET/Industry_Raw_Data_'
CREDENTIALS
'aws_access_key_id=ABC;aws_secret_access_key=HYU'
DELIMITER '|'
ALLOWOVERWRITE
PARALLEL OFF"""
cur = con.cursor()
cur.execute(sql)
con.commit()
con.close()