使用' TABLESAMPLE '查询AWS红移时出现键错误



我试图在jupyter笔记本中使用Python查询AWS Redshift集群上的表。我想从结果集中随机抽样行,并使用TABLESAMPLE。然而,我一直运行到Key Error

import pandas as pd
import redshift_connector

conn = redshift_connector.connect(

host=os.environ['host'],
database=os.environ['database'],
port=int(os.environ['port']),
user=os.environ['user'],
password=os.environ['password']

)

print(conn)

cursor = conn.cursor()
print(cursor)
cursor.execute(
"""
SELECT * 
FROM tbl1 
WHERE type = 'xxx'
AND created_at >= '2022-01-01 00:00:00' 
AND created_at <= '2022-12-31 00:00:00'
TABLESAMPLE BERNOULLI (10);
""" 
)
df = DataFrame(cursor.fetchall())
field_names = [i[0] for i in cursor.description]
df.columns = field_names`
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/anaconda3/lib/python3.10/site-packages/redshift_connector/core.py:1631, in Connection.execute(self, cursor, operation, vals)
1630 try:
-> 1631     ps = cache["ps"][key]
1632     cursor.ps = ps
KeyError: ("n               SELECT * n               FROM pmf  n               WHERE event_type = 'load'n               AND created_at >= '2022-01-01 00:00:00' n               AND created_at <= '2022-12-31 00:00:00'n               TABLESAMPLE BERNOULLI (10);n               ", ())
File ~/anaconda3/lib/python3.10/site-packages/redshift_connector/core.py:1701, in Connection.execute(self, cursor, operation, vals)
1698     else:
1699         raise e
-> 1701 self.handle_messages(cursor)
1703 # We've got row_desc that allows us to identify what we're
1704 # going to get back from this statement.
1705 output_fc = tuple(self.redshift_types[f["type_oid"]][0] for f in ps["row_desc"])
File ~/anaconda3/lib/python3.10/site-packages/redshift_connector/core.py:1969, in Connection.handle_messages(self, cursor)
1966     self.message_types[code](self._read(data_len - 4), cursor)
1968 if self.error is not None:
-> 1969     raise self.error
ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "TABLESAMPLE"', 'P': '220', 'F': '/home/ec2-user/padb/src/pg/src/backend/parser/parser_scan.l', 'L': '732', 'R': 'yyerror'}
cursor.execute(
"""
SELECT * 
FROM tbl1 
WHERE type = 'xxx'
AND created_at >= '2022-01-01 00:00:00' 
AND created_at <= '2022-12-31 00:00:00'
TABLESAMPLE BERNOULLI (10);
""" 
)

问题似乎与TABLESAMPLE的工作方式有关,当使用关键字时,WHERE条件需要在TABLESAMPLE子句之后。

SELECT * 
FROM tbl1 
TABLESAMPLE BERNOULLI (10)
WHERE type = 'xxx'
AND created_at >= '2022-01-01 00:00:00' 
AND created_at <= '2022-12-31 00:00:00';

奇怪的是红移文档页面上没有提到这一点。从Postgres文档中更容易理解,因为Redshift是Postgres的一种风格。

似乎红移目前不支持TABLESAMPLE,你可以尝试下面的替代

select * from (
select *, random() as sample 
from tbl1
WHERE type = 'xxx'
AND created_at >= '2022-01-01 00:00:00' 
AND created_at <= '2022-12-31 00:00:00') as samp
where sample < .01; -- return 1% of rows

相关内容

  • 没有找到相关文章

最新更新