r语言 - 是什么导致了此 dplyr 交叉连接中的"Corrupt `POSIXct` with unknown type logical"错误?



我不久前写了一个数据管道,由于某种原因(可能是新版本的R或某些包),脚本的一部分似乎坏了。具体来说,我发现了一个特定的场景,其中0行数据帧与另一个数据帧的交叉连接会导致错误。我希望最终结果也是一个0行数据帧,其中包含来自两个数据帧的列(保留它们的类),但这会导致一个错误,建议将其报告给包作者。

如果我在这里遗漏了什么来解决这个错误,我想知道。

作为参考,我用dplyr 1.0.9R4.1.2Ubuntu 22.04 LTS(更多细节显示在底部)。

start_shifted <- structure(list(timestamp_start = structure(numeric(0), class = c("POSIXct", 
"POSIXt"), tzone = "America/Chicago"), timestamp_stop = structure(numeric(0), class = c("POSIXct", 
"POSIXt"), tzone = "America/Chicago"), time_since_last_stop = logical(0), 
rest_time = numeric(0), ssid = integer(0)), row.names = integer(0), class = "data.frame")

lap_data <- structure(list(timestamp = structure(1509301003, class = c("POSIXct", 
"POSIXt"), tzone = "America/Chicago"), start_time = "2017-10-29 12:59:00", 
start_position_lat = 41.8741511832923, start_position_long = -87.620877083391, 
end_position_lat = 41.8724051490426, end_position_long = -87.6218410022557, 
total_elapsed_time = 55.003, total_timer_time = 55.003, total_distance = 221.85, 
total_strides = 75L, total_calories = 11L, enhanced_avg_speed = 14.5188, 
avg_speed = 14.5188, enhanced_max_speed = 15.6528, max_speed = 15.6528, 
total_ascent = 0L, total_descent = 0L, event = "lap", event_type = "stop", 
avg_heart_rate = 137L, max_heart_rate = 154L, avg_running_cadence = 82L, 
max_running_cadence = 86L, lap_trigger = "session_end", sub_sport = "generic", 
avg_fractional_cadence = 0.265625, max_fractional_cadence = 0.5, 
total_fractional_cycles = NA, avg_vertical_oscillation = NA, 
avg_temperature = NA, max_temperature = NA, timestamp_utc = "2017-10-29 18:16:43", 
timezone = "America/Chicago", timestamp_previous = structure(NA, class = c("POSIXct", 
"POSIXt"), tzone = ""), lap_id = 1L), row.names = c(NA, -1L
), class = "data.frame")
lap_data %>%
inner_join(
start_shifted,by=character()
)

这是错误文本:

Error in `datetime_validate()`:
! Corrupt `POSIXct` with unknown type logical.
ℹ In file type-date-time.c at line 387.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error in the rlang package, please report it to the package authors.
Backtrace:
▆
1. ├─lap_data %>% mutate(dummy = TRUE) %>% ...
2. ├─dplyr::inner_join(...)
3. ├─dplyr:::inner_join.data.frame(...)
4. │ └─dplyr:::join_mutate(...)
5. │   ├─tibble::as_tibble(x, .name_repair = "minimal")
6. │   └─tibble:::as_tibble.data.frame(x, .name_repair = "minimal")
7. │     └─tibble:::lst_to_tibble(unclass(x), .rows, .name_repair)
8. │       └─tibble:::check_valid_cols(x)
9. │         ├─base::which(!map_lgl(x, is_valid_col))
10. │         └─tibble:::map_lgl(x, is_valid_col)
11. │           └─tibble:::map_mold(.x, .f, logical(1), ...)
12. │             └─base::vapply(.x, .f, .mold, ..., USE.NAMES = FALSE)
13. │               └─tibble FUN(X[[i]], ...)
14. │                 └─vctrs::vec_is(x)
15. │                   └─vctrs:::vec_is_vector(x)
16. │                     └─vctrs `<fn>`()
17. │                       └─vctrs::vec_proxy(x = x)
18. │                         └─vctrs:::datetime_validate(x)
19. └─rlang:::stop_internal_c_lib(...)
20.   └─rlang::abort(message, call = call, .internal = TRUE)

更具体的平台细节:

> R.version
_                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          4                           
minor          1.2                         
year           2021                        
month          11                          
day            01                          
svn rev        81115                       
language       R                           
version.string R version 4.1.2 (2021-11-01)
nickname       Bird Hippie 

与其说这是一个解决方案,不如说是一个问题的清晰识别。

如果对每个列进行迭代,例如

inner_join(lap_data       , start_shifted, by = character()) # fails
inner_join(lap_data[,1:10], start_shifted, by = character()) # succeeds
inner_join(lap_data[,1:20], start_shifted, by = character()) # succeeds

我们将最终确定timestamp_previous(列34)是问题列。这是NA和空的"tzone",让我们看看哪个是问题。

### refresh lap_data
lap_data[[34]] <- Sys.time()
inner_join(lap_data, start_shifted, by = character()) # succeeds

所以这不是POSIXt本身的问题。让我们看看不同类型的NA(从每次作业/测试的新版本lap_data开始)。最重要的是,每一个都继承了inner_join(..)

### refresh lap_data
# logical NA
lap_data[[34]] <- NA # inner_join(lap_data, start_shifted, by = character()) succeeds
lap_data[[34]] <- NA_real_
lap_data[[34]] <- NA_character_
lap_data[[34]] <- Sys.time()[NA]

所有的成功。它是最后一个,我有点奇怪,因为原timestamp_previous也是POSIXtNA,所以…

dput(Sys.time()[NA])
# structure(NA_real_, class = c("POSIXct", "POSIXt"))
dput(lap_data[[34]])
# structure(NA, class = c("POSIXct", "POSIXt"), tzone = "")
我花了很长时间才找到这里的钥匙,但是……您的timestamp_previous中有NAlogical变体,这是不应该的。然而,您正在获取/转换/创建该字段有一个缺陷。

明确我们所拥有的(第一)和我们所需要的(第二):

dput(lap_data[[34]])
# structure(NA, class = c("POSIXct", "POSIXt"))
dput(as.POSIXct(as.numeric(lap_data[[34]])))
# structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")

进一步,如果我们从结构体中剥离class=

class(unclass(lap_data[[34]]))
# [1] "logical"
class(unclass(Sys.time()))
# [1] "numeric"

这次旅行的决心:

  1. 修复代码。在某个地方有一些代码正在"破坏"。您的POSIXt列是合乎逻辑的-NAclass(..) <- c("POSIXct","POSIXt"),无论是字面上还是隐藏。

  2. 对于这个数据的快速修复,这里是那一列的修复:

    lap_data[[34]] <- as.POSIXct(unclass(lap_data[[34]]))
    inner_join(lap_data, start_shifted, by = character()) # succeeds
    

    如果你有多个列,这里有一个修复:

    fixpsx <- function(x, origin = "1970-01-01", tzone = "") {
    if (inherits(x, "POSIXt") && inherits(unclass(x), "logical")) 
    as.POSIXct(unclass(x), origin = origin, tzone = tzone)
    else x
    }
    lap_data[] <- lapply(lap_data, fixpsx)
    inner_join(lap_data, start_shifted, by = character()) # succeeds
    

相关内容

  • 没有找到相关文章

最新更新