AWS Glue,PySpark ||从RDS读取DynamicFrame时出错



我正在使用以下代码从RDS表创建一个动态帧。它适用于其他表,但其中一个表出现了一个奇怪的错误——"java.sql.SQLException:DAY_of_MONTH"。错误跟踪也在下面。请帮忙。

dyf = glueContext.create_dynamic_frame.from_options(connection_type = "mysql",
connection_options = {"url": "jdbc:mysql://endpoint:port/database", 
"dbtable": "table_name",
"user": userDestination,
"password": passwordDestination,
"customJdbcDriverClassName": jarDriver,
"customJdbcDriverS3Path": jarPath},
additional_options = {"jobBookmarkKeys": ["PK_ID"],
"jobBookmarksKeysSortOrder": "asc"},
transformation_ctx = "dyf")

错误跟踪

遇到错误:调用o1031.count时出错。:org.apache.spark.SparkException:由于阶段失败,作业中止:阶段194.0中的任务0失败4次,最近一次失败:丢失任务阶段194.0中的0.3(,executor 28):java.sql.sql异常:DAY_OF_MONTHcom.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)在com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLException_Mapping.java:85)在com.mysql.cj.jdbc.result.ResultSetImpl.getDate(ResultSetImpl.java:755)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:389)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:387)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:356)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:338)网址:org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)在org.apache.spark.InterruptbleIterator.hasNext(InterruptableIterator.scala:37)在org.apache.spark.util.CompletionIterator.hasNext(CompletionItator.scala:31)位于scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)org.apache.spark.sql.expension.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:99)在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:55)在org.apache.spark.scheduler.Task.run(Task.scala:121)org.apache.spark.executor.executor$TaskRunner$$anonfun$10.apply(executor.scala:408)网址:org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)在org.apache.spark.executor.executor$TaskRunner.run(executor.scala:414)在java.util.concurrent.ThreadPoolExecutiator.runWorker(ThreadPoolExecutiator.java:1149)在java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)在java.lang.Thread.run(Thread.java:748)由以下原因引起:com.mysql.cj.exceptions.WrongArgumentException:DAY_OF_MONTHsun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor Impl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessor Impl.java:45)位于java.lang.reflect.Constructure.newInstance(Constructor.java:423)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:85)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:50)在com.mysql.cj.result.AbstractDateTimeValueFactory.createFromDate(AbstractDateTimeValue Factory.java:67)在com.mysql.cj.protocol.a.MysqlBinaryValueDecoder.decodeDate(MysqlBinaryValueDecoder.java:129)在com.mysql.cj.protocol.result.AbstractResultsetRow.decodeAndCreateReturnValue(AbstractResultsetRow.java:90)在com.mysql.cj.protocol.result.AbstractResultsetRow.getValueFromBytes(AbstractResultsetRow.java:241)在com.mysql.cj.protocol.aresult.ByteArrayRow.getValue(ByteArrayRow.java:91)…还有29个原因:java.lang.IollegalArgumentException:DAY_OF_MONTHjava.util.GregorianCalendar.comuteTime(GregorianCalendar.java:2648)位于java.util.Calendar.updateTime(Calendar.java:3393)java.util.Calendar.getTimeInMillis(Calendar.java:1782)位于com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:82)…再增加35个

驱动程序堆栈:位于org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSchedler$$failJobAndIndependentStages(DAGScheuler.scala:1889)在org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)在org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)在scala.collection.mutable.RizableArray$class.foreach(ResizableArray.scala:59)位于scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)在org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)在org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)在scala。Option.foreach(Option.scala:257)org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheudler.scala:2110)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheudler.scala:2059)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheudler.scala:2048)网址:org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)网址:org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)org.apache.spark.rdd.rdd.count(rdd.scala:1168)com.amazonaws.services.glue.DynamicFrame.count(DynamicFrame.scala:1145)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:498)位于的py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)位于的py4j.reflection.ReflectionEngine.reinvoke(ReflectionEngine.java:357)py4j.Gateway.ioke(Gateway.java:282)位于py4j.commands.AbstractCommand.invokeMethod(AbstractCmd.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)py4j.GatewayConnection.run(GatewayConnection.java:238)位于java.lang.Thread.run(Thread.java:748)原因:java.sql.sql异常:DAY_OF_MONTH位于com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)在com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLException_Mapping.java:85)在com.mysql.cj.jdbc.result.ResultSetImpl.getDate(ResultSetImpl.java:755)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:389)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:387)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:356)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:338)网址:org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)在org.apache.spark.InterruptbleIterator.hasNext(InterruptableIterator.scala:37)在org.apache.spark.util.CompletionIterator.hasNext(CompletionItator.scala:31)位于scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)org.apache.spark.sql.expension.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:99)在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:55)在org.apache.spark.scheduler.Task.run(Task.scala:121)org.apache.spark.executor.executor$TaskRunner$$anonfun$10.apply(executor.scala:408)网址:org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)在org.apache.spark.executor.executor$TaskRunner.run(executor.scala:414)在java.util.concurrent.ThreadPoolExecutiator.runWorker(ThreadPoolExecutiator.java:1149)在java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)…还有1个原因:com.mysql.cj.exceptions.WrongArgumentException:DAY_OF_MONTHsun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor Impl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessor Impl.java:45)位于java.lang.reflect.Constructure.newInstance(Constructor.java:423)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:85)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:50)在com.mysql.cj.result.AbstractDateTimeValueFactory.createFromDate(AbstractDateTimeValue Factory.java:67)在com.mysql.cj.protocol.a.MysqlBinaryValueDecoder.decodeDate(MysqlBinaryValueDecoder.java:129)在com.mysql.cj.protocol.result.AbstractResultsetRow.decodeAndCreateReturnValue(AbstractResultsetRow.java:90)在com.mysql.cj.protocol.result.AbstractResultsetRow.getValueFromBytes(AbstractResultsetRow.java:241)在com.mysql.cj.protocol.aresult.ByteArrayRow.getValue(ByteArrayRow.java:91)…还有29个原因:java.lang.IollegalArgumentException:DAY_OF_MONTHjava.util.GregorianCalendar.comuteTime(GregorianCalendar.java:2648)位于java.util.Calendar.updateTime(Calendar.java:3393)java.util.Calendar.getTimeInMillis(Calendar.java:1782)位于com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:82)…再增加35个

Traceback(最近一次通话):文件"/mnt/yarn/usercache/livy/appcache/application_1583217406561_0001/container_1583217406561_0001_01_000001/PyGlue.zip/awsglue/dynamicframe.py",第294行,计数回归自我_jdf.count()文件"/mnt/yarn/usercache/livy/appcache/application_1583217406561_0001/container_1583217406561_0001_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py",第1257行,在调用应答中,self-gateway_client,self-target_id,self.name)文件"/mnt/yarn/usercache/livy/appcache/application_1583217406561_0001/container_1583217406561_0001_01_00001/pyspark.zip/pyspark/sql/utils.py",第63行,装饰return f(*a,**kw)File"/mnt/syar/usercache/livy/appcache/application_1583217406561_0001/container_1583217406561_0001_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py",第328行,在get_return_value中format(target_id,".",name),value)py4j.protocol.Py4JJava错误:调用o1031.count时出错。:org.apache.spark.SparkException:由于阶段失败,作业中止:阶段194.0中的任务0失败4次,最近一次失败:丢失任务阶段194.0中的0.3(,executor 28):java.sql.sql异常:DAY_OF_MONTHcom.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)在com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLException_Mapping.java:85)在com.mysql.cj.jdbc.result.ResultSetImpl.getDate(ResultSetImpl.java:755)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:389)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:387)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:356)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:338)网址:org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)在org.apache.spark.InterruptbleIterator.hasNext(InterruptableIterator.scala:37)在org.apache.spark.util.CompletionIterator.hasNext(CompletionItator.scala:31)位于scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)org.apache.spark.sql.expension.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:99)在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:55)在org.apache.spark.scheduler.Task.run(Task.scala:121)org.apache.spark.executor.executor$TaskRunner$$anonfun$10.apply(executor.scala:408)网址:org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)在org.apache.spark.executor.executor$TaskRunner.run(executor.scala:414)在java.util.concurrent.ThreadPoolExecutiator.runWorker(ThreadPoolExecutiator.java:1149)在java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)在java.lang.Thread.run(Thread.java:748)由以下原因引起:com.mysql.cj.exceptions.WrongArgumentException:DAY_OF_MONTHsun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor Impl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessor Impl.java:45)位于java.lang.reflect.Constructure.newInstance(Constructor.java:423)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:85)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:50)在com.mysql.cj.result.AbstractDateTimeValueFactory.createFromDate(AbstractDateTimeValue Factory.java:67)在com.mysql.cj.protocol.a.MysqlBinaryValueDecoder.decodeDate(MysqlBinaryValueDecoder.java:129)在com.mysql.cj.protocol.result.AbstractResultsetRow.decodeAndCreateReturnValue(AbstractResultsetRow.java:90)在com.mysql.cj.protocol.result.AbstractResultsetRow.getValueFromBytes(AbstractResultsetRow.java:241)在com.mysql.cj.protocol.aresult.ByteArrayRow.getValue(ByteArrayRow.java:91)…还有29个原因:java.lang.IollegalArgumentException:DAY_OF_MONTHjava.util.GregorianCalendar.comuteTime(GregorianCalendar.java:2648)位于java.util.Calendar.updateTime(Calendar.java:3393)java.util.Calendar.getTimeInMillis(Calendar.java:1782)位于com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:82)…再增加35个

驱动程序堆栈:位于org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSchedler$$failJobAndIndependentStages(DAGScheuler.scala:1889)在org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)在org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)在scala.collection.mutable.RizableArray$class.foreach(ResizableArray.scala:59)位于scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)在org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)在org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)在scala。Option.foreach(Option.scala:257)org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheudler.scala:2110)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheudler.scala:2059)在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheudler.scala:2048)网址:org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)网址:org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)org.apache.spark.rdd.rdd.count(rdd.scala:1168)com.amazonaws.services.glue.DynamicFrame.count(DynamicFrame.scala:1145)位于的sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)位于java.lang.reflect.Method.ioke(Method.java:498)位于的py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)位于的py4j.reflection.ReflectionEngine.reinvoke(ReflectionEngine.java:357)py4j.Gateway.ioke(Gateway.java:282)位于py4j.commands.AbstractCommand.invokeMethod(AbstractCmd.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)py4j.GatewayConnection.run(GatewayConnection.java:238)位于java.lang.Thread.run(Thread.java:748)原因:java.sql.sql异常:DAY_OF_MONTH位于com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)在com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)在com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLException_Mapping.java:85)在com.mysql.cj.jdbc.result.ResultSetImpl.getDate(ResultSetImpl.java:755)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:389)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anonfun$org.apache$spark.sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$2.apply(JdbcUtils.scala:387)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:356)在org.apache.spark.sql.exexecution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:338)网址:org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)在org.apache.spark.InterruptbleIterator.hasNext(InterruptableIterator.scala:37)在org.apache.spark.util.CompletionIterator.hasNext(CompletionItator.scala:31)位于scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)org.apache.spark.sql.expension.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:295)在org.apache.spark.sql.exexecution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:266)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.rdd$$anonfun$mapPartitionsInternal$$anonfon$apply$24.apply(rdd.scala:836)在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions rdd.scala:52)网址:org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:324)网址:org.apache.spark.rdd.rdd.iterator(rdd.scala:288)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:99)在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMappingTask.scala:55)在org.apache.spark.scheduler.Task.run(Task.scala:121)org.apache.spark.executor.executor$TaskRunner$$anonfun$10.apply(executor.scala:408)网址:org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)在org.apache.spark.executor.executor$TaskRunner.run(executor.scala:414)在java.util.concurrent.ThreadPoolExecutiator.runWorker(ThreadPoolExecutiator.java:1149)在java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)…还有1个原因:com.mysql.cj.exceptions.WrongArgumentException:DAY_OF_MONTHsun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessor Impl.java:62)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessor Impl.java:45)位于java.lang.reflect.Constructure.newInstance(Constructor.java:423)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)在com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:85)在com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:50)在com.mysql.cj.result.AbstractDateTimeValueFactory.createFromDate(AbstractDateTimeValue Factory.java:67)在com.mysql.cj.protocol.a.MysqlBinaryValueDecoder.decodeDate(MysqlBinaryValueDecoder.java:129)在com.mysql.cj.protocol.result.AbstractResultsetRow.decodeAndCreateReturnValue(AbstractResultsetRow.java:90)在com.mysql.cj.protocol.result.AbstractResultsetRow.getValueFromBytes(AbstractResultsetRow.java:241)在com.mysql.cj.protocol.aresult.ByteArrayRow.getValue(ByteArrayRow.java:91)…还有29个原因:java.lang.IollegalArgumentException:DAY_OF_MONTHjava.util.GregorianCalendar.comuteTime(GregorianCalendar.java:2648)位于java.util.Calendar.updateTime(Calendar.java:3393)java.util.Calendar.getTimeInMillis(Calendar.java:1782)位于com.mysql.cj.result.SqlDateValueFactory.localCreateFromDate(SqlDateValueFactory.java:82)…再增加35个

我最近遇到了这种错误。

每当我读一张表,我的火花脚本就会抛出:

java.sql.SQLException: DAY_OF_MONTH

经过一番调查,我发现问题是日期列上的一个无效值,在我的情况下,它有点像2020-10-00

解决方案

  • 查找日期无效的行并修复它们

  • 使用customSchema将日期列读取为字符串,然后进行处理

如何查找无效日期

1.查询数据库中的无效日期

您可能可以使用ISDATE构建一个查询来查找无效日期。

2.使用您的spark脚本进行查询

你也可以在你的spark脚本中找到这一点(我不得不这样做,因为我在DB中只有选择权限,不能运行例程)。

要在脚本中查询数据库中的无效日期,我建议您首先将日期列作为字符串检索。

df =
...
"dbtable": "(select id, CAST(your_date_col AS CHAR) from your_table) as r"
...

然后查找无效日期:

df.filter(col("your_date_col") != '0000-00-00') 
.withColumn("dt", to_date(col("your_date_col"),"yyyy-MM-dd")) 
.filter(col("dt").isNull()) 
.show()

你会得到这样的东西:

+------+-------------+----+
|    id|your_date_col|  dt|
+------+-------------+----+
|333300|   2020-10-00|null|
+------+-------------+----+

希望它能有所帮助,干杯。

相关内容

  • 没有找到相关文章

最新更新