自托管集成运行时加载ORC文件时Azure数据工厂管道失败:OutOfMemory异常,堆大小



在尝试从Azure数据工厂加载ORC文件时,我目前面临一个问题。当文件太大时,ADF管道会抱怨我们的自托管集成运行时失败,并出现OutOfMemory异常,原因是Java最大堆大小太小,无法完成加载。

已经尝试了不同的解决方案,比如通过环境变量甚至注册表中的键来增加堆大小(有点像黑客(。具有自托管集成运行时的虚拟机具有超过100GB的RAM。

尽管仍然失败,因为这些值似乎一直被"覆盖;默认";从ADF查询Integration Runtime时的值。有什么想法吗?

'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'
Job ID: daee1a1d-b880-ecb2-e56c-a59397547668
Log ID: Warning        
TraceComponentId: TransferClientLibrary
TraceMessageId: TasksCoordinatorFatalErrorCallback
@logId: Warning
jobId: daee1a1d-b880-ecb2-e56c-a59397547668
activityId: c643b611-8356-4f49-b6d6-e87ea50670e5
eventId: TasksCoordinatorFatalErrorCallback
message: 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'

微软自己发现了一个错误。加载.orc文件时,如果.orc文件包含charvarchartring类型修复了此错误。它已经得到了微软的认可,它在Azure数据工厂方面,从现在起需要大约6个月的时间才能修复。

相关内容

最新更新