AWS CLI运行SQL查询



我用Python编写了以下代码。我希望将它迁移到Bash,或者只使用普通的旧AWS CLI。任务是使用S3 Select在S3存储桶上运行SQL查询。注意:S3中的文件都是gziped。

现有Python代码(正在工作(

ACCESS_KEY = 'Key1'
SECRET_KEY = 'Key2'
s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
r = s3.select_object_content(
Bucket='bkt1',
Key=file2search,
ExpressionType='SQL',
Expression="SELECT * FROM s3object s where Lower(s._1) = (%r)" % ("SEARCH_STRING"),
InputSerialization = {'CompressionType': 'GZIP', 'CSV': {
'AllowQuotedRecordDelimiter': True,
'QuoteEscapeCharacter': '"',
'RecordDelimiter': 'n',
'FieldDelimiter': ':',
}},
OutputSerialization = {'CSV': {
'QuoteEscapeCharacter': '"',
'RecordDelimiter': 'n',
'FieldDelimiter': ':',
}}
)

Bash代码(无效(

SEARCH_STRING="hello@world.com"
aws s3api select-object-content 
--bucket projectbucket2 
--key abc.gz 
--expression "SELECT * FROM s3object s where Lower(s._1) = '$SEARCH_STRING'" 
--expression-type 'SQL' 
--input-serialization '{"CSV": {}, "CompressionType": "GZIP"}' 
--output-serialization '{"CSV": {}}' "output.csv"

代码引发错误:An error occurred (LexerInvalidChar) when calling the SelectObjectContent operation: Invalid character at line 1, column 46.请帮忙。

它在Ubuntu上运行得非常好:

ubuntu@ip-172-31-8-201:~$ aws --version
aws-cli/2.0.38 Python/3.7.3 Linux/5.3.0-1023-aws exe/x86_64.ubuntu.18
ubuntu@ip-172-31-8-201:~$ echo $SEARCH_STRING
taipei 101
ubuntu@ip-172-31-8-201:~$ aws s3api select-object-content --bucket my-bucket --key towers.csv --expression "SELECT * FROM s3object s where Lower(s._2) = '$SEARCH_STRING'" --expression-type 'SQL' --input-serialization '{"CSV": {}, "CompressionType": "NONE"}'     --output-serialization '{"CSV": {}}' "output.csv"
ubuntu@ip-172-31-8-201:~$ cat output.csv 
5,Taipei 101,Taipei,Taiwan,509,1670,101,2004

我使用的是AWS CLI v2。

您已结束使用单引号('(字符。

例如:--input-serialization '{'CompressionType':第一个引号打开JSON,但紧接在花括号后面的引号关闭引号。

相反,尝试交替使用单引号和双引号。例如:

--input-serialization '--input-serialization '{"CompressionType": "GZIP", "CSV": {

这允许单引号包含整个JSON,而双引号在JSON内部使用

结果看起来像这样(我没有测试它(:

aws s3api select-object-content 
--bucket bkt1 
--key filename.gz 
--expression "SELECT * FROM s3object s where Lower(s._1) = '$SEARCH_STRING'" 
--expression-type 'SQL' 
--input-serialization '{"CompressionType": "GZIP", "CSV": {
"AllowQuotedRecordDelimiter": True,
"QuoteEscapeCharacter": """,
"RecordDelimiter": "n",
"FieldDelimiter": ":",
}}' 
--output-serialization '{"CSV": {
"QuoteEscapeCharacter": """,
"RecordDelimiter": "n",
"FieldDelimiter": ":",
}}'  

最新更新