让我们假设我有这样的数据:
(Date, Most Active User)
(6/1/2014, "Bob")
(6/2/2014, "Joe")
(6/3/2014, "Jim")
(6/7/2014, "Jack")
请注意,日期 (6/4/2014)、(6/5/2014)、(6/6/2014) 缺少行。我想将这些行的默认"最活跃用户"值填充到定义该值的最新行。例如,这些行的值应为"Jim"。
您可以做到这一点的一种方法是编写一个函数,该函数将递归遍历您的记录和日期流,比较日期并生成另一个缺少日期和默认用户的记录列表。
我将使用 scala.collection.immutable.Stream
和 nscala 作为我的例子。Nscala是java JodaTime的简单包装器。要使用它,请将libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "1.8.0"
添加到您的build.sbt
文件中。工作代码是:
import scala.collection.immutable.Stream
import com.github.nscala_time.time.Imports._
//formatter to compare strings in your record with DateTime
val fmt = DateTimeFormat.forPattern("M/d/yyyy")
//Stream lazy magic is here. We create infinite stream of days
def days(start: DateTime): Stream[DateTime] = start #:: days(start + 1.day)
val records = List(("6/1/2014", "Bob"), ("6/2/2014", "Joe"),
("6/3/2014", "Jim"), ("6/7/2014", "Jack"))
//Recursion here
def fillDefaultValues(records: List[(String, String)],
days: Stream[DateTime]): List[(String, String)] =
//Let be cautious and use headOption.
(records.headOption, days.headOption) match {
case (Some((date, name)), Some(day)) if date == day.toString(fmt) =>
(date, name) :: fillDefaultValues(records.tail, days.tail)
case (Some((date, name)), Some(day)) =>
(day.toString(fmt), "Most Active User") ::
fillDefaultValues(records, days.tail)
case (None, _) => Nil
case (Some((date, name)), None) => throw new Exception("Days are out")
}
fillDefaultValues(records, days(new DateTime(2014, 6, 1, 0, 0)).slice(0, 100)).foreach(println)
和输出:
(6/1/2014,Bob)
(6/2/2014,Joe)
(6/3/2014,Jim)
(6/4/2014,Most Active User)
(6/5/2014,Most Active User)
(6/6/2014,Most Active User)
(6/7/2014,Jack)