java.lang.ArrayIndexOutOfBoundsException: 0 : If Directory D



请帮助我解决以下情况。我正在扫描最近两个小时的文件夹,然后取出最新的CSV文件并生成一个列表。如果两个hours文件夹都包含文件,则下面的代码按预期工作。但如果任何文件夹不包含任何文件,则显示"ArrayIndexOutOfBoundsException: 0">

代码:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import scala.language.postfixOps
val hdfsConf = new Configuration();
var path="/user/hdfs/test/input"
var finalFiles = List[String]()
val currentTs = java.time.LocalDateTime.now
val hours=2
var paths = (0 until hours.toInt).map(h => currentTs.minusHours(h))
.map(ts=>s"${path}/partition_date=${ts.toLocalDate}/hour=${ts.toString.substring(11, 13)}")
.toList
// paths: List[String] = List(/user/hdfs/test/input/partition_date=2022-11-30/hour=19,
// /user/hdfs/test/input/partition_date=2022-11-30/hour=18)
for (eachfolder <- paths) {
var New_Folder_Path: String = eachfolder
var fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)
var pathstatus = fs.listStatus(new Path(New_Folder_Path))
var currpathfiles = pathstatus.map(x => Row(x.getPath.toString, x.getModificationTime))
var latestFile = spark.sparkContext.parallelize(currpathfiles)
.map(row => (row.getString(0), row.getLong(1)))
.toDF("FilePath", "ModificationTime")
.filter(col("FilePath")
.like("%.csv%"))
.sort($"ModificationTime".desc)
.select(col("FilePath")).limit(1)
.map(row => row.getString(0)).collectAsList.get(0)
finalFiles = latestFile :: finalFiles
}

错误:

java.lang.ArrayIndexOutOfBoundsException: 0 

当试图从空列表中获取0第th元素时,您遇到了问题。您可以通过在得到的Option上使用ListheadOption方法和foreach方法来避免这种情况。

spark.sparkContext.parallelize(currpathfiles)
.map(row => (row.getString(0), row.getLong(1)))
...
.map(row => getString(0))
.collectAsList.headOption
.foreach(latestFile =>  finalFiles = latestFile :: finalFiles)

还请注意,不是将latestFile分配给var,我的实现只是将Optionforeach添加到finalFiles列表中(因为每个只在我们调用collectAsList之后存在元素时才会起作用)。