胶水 - 调用getDynamicFrame时发生错误



我正在使用胶水将数据从胶水目录中的表传输到RDS实例中的另一个表。以下是用于连接到Glue目录表的代码段。

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev", table_name = "tbl", transformation_ctx = "datasource0")
............
job.commit()

请注意,胶水目录表具有数据,甚至从雅典娜进行了验证。但是我反复低于错误。

File "script_2019-05-16-16-17-26.py", line 20, in <module>
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev", table_name = "tbl", transformation_ctx = "datasource0")
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 570, in from_catalog
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/context.py", line 138, in create_dynamic_frame_from_catalog
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1558022970835_0001/container_1558022970835_0001_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o63.getDynamicFrame.
: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:540)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:68)

IAM的角色作用具有S3fullaccess,GluefullaCcess和CloudWatchLogFullAgcess

附加政策

我在连接到RD的问题上也有类似的问题,并且在此处是" https://aws.amazon.com/premsupport/knowledge-center/nockledge-center/connection-connection termout termout-glue-redshift-rds-rds/"。AWS胶水支持每个作业或开发终点的一个连接。如果您在作业中指定了多个连接,则AWS胶水仅使用第一个连接。

最新更新