如何从Scala中的一系列数据范围中删除空数据框



如何从一系列数据帧中删除空数据帧?在下面的代码段中,TwoColdf中有许多空的数据帧。还有下面的循环问题,有没有办法使我有效?我尝试将其重写为下线,但没有工作

//finalDF2 = (1 until colCount).flatMap(j => groupCount(j).map( y=> finalDF.map(a=>a.filter(df(cols(j)) === y)))).toSeq.flatten
   var twoColDF: Seq[Seq[DataFrame]] = null
if (colCount == 2  ) 
{
  val i = 0
  for (j <- i + 1 until colCount) {
      twoColDF = groupCount(j).map(y => {
      finalDF.map(x => x.filter(df(cols(j)) === y))
    })
  }
}finalDF = twoColDF.flatten

给定了一组数据框,您可以访问每个数据框的基础RDD,并使用isEmpty过滤空框架:

val input: Seq[DataFrame] = ???
val result = input.filter(!_.rdd.isEmpty())

至于您的另一个问题 - 我无法理解您的代码试图做什么,但是我首先尝试将其转换为更多功能(删除使用var S和命令性有条件的条件)。如果我猜测您输入的含义,这可能等同于您要做的事情:

var input: Seq[DataFrame] = ???
// map of column index to column values -
// for each combination we'd want a new DF where that column has that value
// I'm assuming values are Strings, can be anything else
val groupCount: Map[Int, Seq[String]] = ???
// for each combination of DF + column + value - produce the filtered DF where this column has this value
val perValue: Seq[DataFrame] = for {
  df <- input
  index <- groupCount.keySet
  value <- groupCount(index)
} yield df.filter(col(df.columns(index)) === value)
// remove empty results:
val result: Seq[DataFrame] = perValue.filter(!_.rdd.isEmpty())

最新更新