烫伤:在行中填充不连续性



让我们假设我有这样的数据:

(Date, Most Active User)
(6/1/2014, "Bob")
(6/2/2014, "Joe")
(6/3/2014, "Jim")
(6/7/2014, "Jack")

请注意,日期 (6/4/2014)、(6/5/2014)、(6/6/2014) 缺少行。我想将这些行的默认"最活跃用户"值填充到定义该值的最新行。例如,这些行的值应为"Jim"。

您可以做到这一点的一种方法是编写一个函数,该函数将递归遍历您的记录和日期流,比较日期并生成另一个缺少日期和默认用户的记录列表。

我将使用 scala.collection.immutable.Stream 和 nscala 作为我的例子。Nscala是java JodaTime的简单包装器。要使用它,请将libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "1.8.0"添加到您的build.sbt文件中。工作代码是:

import scala.collection.immutable.Stream
import com.github.nscala_time.time.Imports._
//formatter to compare strings in your record with DateTime
val fmt = DateTimeFormat.forPattern("M/d/yyyy")
//Stream lazy magic is here. We create infinite stream of days
def days(start: DateTime): Stream[DateTime] = start #:: days(start + 1.day)
val records = List(("6/1/2014", "Bob"), ("6/2/2014", "Joe"),
  ("6/3/2014", "Jim"), ("6/7/2014", "Jack"))
//Recursion here
def fillDefaultValues(records: List[(String, String)], 
     days: Stream[DateTime]): List[(String, String)] =
//Let be cautious and use headOption. 
  (records.headOption, days.headOption) match {
    case (Some((date, name)), Some(day)) if date == day.toString(fmt) => 
      (date, name) :: fillDefaultValues(records.tail, days.tail)
    case (Some((date, name)), Some(day)) => 
     (day.toString(fmt), "Most Active User") :: 
        fillDefaultValues(records, days.tail)
    case (None, _) => Nil
    case (Some((date, name)), None) => throw new Exception("Days are out")
  }
fillDefaultValues(records, days(new DateTime(2014, 6, 1, 0, 0)).slice(0, 100)).foreach(println)

和输出:

(6/1/2014,Bob)
(6/2/2014,Joe)
(6/3/2014,Jim)
(6/4/2014,Most Active User)
(6/5/2014,Most Active User)
(6/6/2014,Most Active User)
(6/7/2014,Jack)

相关内容

  • 没有找到相关文章

最新更新