r-如何安全地存储时间戳之间的毫秒差异

这是一个与R中的浮点近似和时间戳有关的地狱般的问题。准备好：(考虑这个简单的例子：

library(tibble)
library(lubridate)
library(dplyr)
tibble(timestamp_chr1 = c('2014-01-02 01:35:50.858'),
timestamp_chr2 = c('2014-01-02 01:35:50.800')) %>% 
mutate(time1 = lubridate::ymd_hms(timestamp_chr1),
time2 = lubridate::ymd_hms(timestamp_chr2),
timediff = as.numeric(time1 - time2))

# A tibble: 1 x 5
timestamp_chr1          timestamp_chr2          time1                      time2                       timediff
<chr>                   <chr>                   <dttm>                     <dttm>                         <dbl>
1 2014-01-02 01:35:50.858 2014-01-02 01:35:50.800 2014-01-02 01:35:50.858000 2014-01-02 01:35:50.799999 0.0580001

这里，两个时间戳之间的时间差显然是58毫秒，但R用一些浮点近似值来存储它，使其显示为0.058001秒。

将精确地58毫秒作为asnwer的最安全方法是什么？我曾想过使用as.integer(而不是as.numeric(，但我担心会丢失一些信息。这里能做什么？

谢谢！

一些注意事项，有些我想你已经知道了：

浮点很少能完美地为提供58毫秒(由于R常见问题解答7.31和IEEE-754(；
数据的显示可以在控制台上使用options(digits.secs=3)(和digits=3(进行管理，也可以在报告中使用sprintf、format或round进行管理；
计算如果在计算之前进行舍入，则可以提高"优度"；虽然这有点麻烦，但只要我们能够安全地假设数据至少精确到毫秒，这在数学上是成立的。

如果你担心在数据中引入错误，那么另一种选择是编码为毫秒(而不是秒的R范数(。如果您可以选择一个任意的最近(24天以下(参考点，那么您可以使用普通的integer，但如果这还不够，或者您更喜欢使用epoch毫秒，那么您需要跳到64位整数，可能使用bit64。

now <- Sys.time()
as.integer(now)
# [1] 1583507603
as.integer(as.numeric(now) * 1000)
# Warning: NAs introduced by coercion to integer range
# [1] NA
bit64::as.integer64(as.numeric(now) * 1000)
# integer64
# [1] 1583507603439

相关内容

最新更新

热门标签：