Databricks 'date_trunc' 函数如何在后端运行？

我希望在Databricks中看到date_trunc函数的源代码。pyspark源代码没有回答我的问题。基本上，我想知道最核心的部分发生了什么;例如，它是否运行regexp模式/方法或它有自己的算法?

有人能帮忙吗?谢谢你！

Spark代码实际上是在JVM上运行的Scala代码，即使您可以从Python中使用它，并且可以在GitHub上获得:https://github.com/apache/spark

我相信你正在寻找的代码是可见的https://github.com/apache/spark/blob/b6aea1a8d99b3d99e91f7f195b23169d3d61b6a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L971

def truncTimestamp(micros: Long, level: Int, zoneId: ZoneId): Long = {
// Time zone offsets have a maximum precision of seconds (see `java.time.ZoneOffset`). Hence
// truncation to microsecond, millisecond, and second can be done
// without using time zone information. This results in a performance improvement.
level match {
case TRUNC_TO_MICROSECOND => micros
case TRUNC_TO_MILLISECOND =>
micros - Math.floorMod(micros, MICROS_PER_MILLIS)
case TRUNC_TO_SECOND =>
micros - Math.floorMod(micros, MICROS_PER_SECOND)
case TRUNC_TO_MINUTE => truncToUnit(micros, zoneId, ChronoUnit.MINUTES)
case TRUNC_TO_HOUR => truncToUnit(micros, zoneId, ChronoUnit.HOURS)
case TRUNC_TO_DAY => truncToUnit(micros, zoneId, ChronoUnit.DAYS)
case _ => // Try to truncate date levels
val dDays = microsToDays(micros, zoneId)
daysToMicros(truncDate(dDays, level), zoneId)
}
}

相关内容

最新更新

热门标签：