如何在AWS Glue中通过python shell job连接和查询MySQL数据库



我使用sqlalchemy来创建连接和查询mySQL数据库,但是glue似乎不支持sqlalchemy"甚至是"pymysql"。有没有办法在Glue python shell作业上做到这一点?

我认为你需要安装sqlalchemy和pymysql。如果你使用的是Spark运行时,Glue可以很容易地安装额外的py库,但是py shell运行时似乎有点不同。

我让它工作的唯一方法是下载(或创建)whl文件。幸运的是,您可以从pypi下载sqlalchemy和pymysql。注意:如果您需要特定的版本,sqlchemywhl文件有许多选项。

在s3桶中获取这两个完整文件,您的粘合作业将可以访问这些文件。然后将两个路径(用逗号分隔)添加到Job

中的Python library path

中。的例子。

s3://my-bucket/SQLAlchemy-1.4.36-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl,s3://my-bucket/PyMySQL-1.0.2-py3-none-any.whl

那么你应该可以像这样导入它们

import sqlalchemy
import pymysql

print('sqlalchemy', sqlalchemy.__version__)
print('pymysql', pymysql.__version__)
May 7, 2022, 9:38:03 AM Pending execution
Processing ./glue-python-libs-ox4yhv_1/SQLAlchemy-1.4.36-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Collecting greenlet!=0.4.17; python_version >= "3" and (platform_machine == "aarch64" or (platform_machine == "ppc64le" or (platform_machine == "x86_64" or (platform_machine == "amd64" or (platform_machine == "AMD64" or (platform_machine == "win32" or platform_machine == "WIN32"))))))
Downloading greenlet-1.1.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (147 kB)
Collecting importlib-metadata; python_version < "3.8"
Downloading importlib_metadata-4.8.3-py3-none-any.whl (17 kB)
Collecting zipp>=0.5
Downloading zipp-3.6.0-py3-none-any.whl (5.3 kB)
Collecting typing-extensions>=3.6.4; python_version < "3.8"
Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Installing collected packages: greenlet, zipp, typing-extensions, importlib-metadata, SQLAlchemy
Successfully installed SQLAlchemy-1.4.36 greenlet-1.1.2 importlib-metadata-4.8.3 typing-extensions-4.1.1 zipp-3.6.0
Processing ./glue-python-libs-ox4yhv_1/PyMySQL-1.0.2-py3-none-any.whl
Installing collected packages: PyMySQL
Successfully installed PyMySQL-1.0.2
sqlalchemy 1.4.36 pymysql 1.0.2

最新更新