我不久前写了一个数据管道,由于某种原因(可能是新版本的R或某些包),脚本的一部分似乎坏了。具体来说,我发现了一个特定的场景,其中0行数据帧与另一个数据帧的交叉连接会导致错误。我希望最终结果也是一个0行数据帧,其中包含来自两个数据帧的列(保留它们的类),但这会导致一个错误,建议将其报告给包作者。
如果我在这里遗漏了什么来解决这个错误,我想知道。
作为参考,我用dplyr 1.0.9
R4.1.2
Ubuntu 22.04 LTS(更多细节显示在底部)。
start_shifted <- structure(list(timestamp_start = structure(numeric(0), class = c("POSIXct",
"POSIXt"), tzone = "America/Chicago"), timestamp_stop = structure(numeric(0), class = c("POSIXct",
"POSIXt"), tzone = "America/Chicago"), time_since_last_stop = logical(0),
rest_time = numeric(0), ssid = integer(0)), row.names = integer(0), class = "data.frame")
lap_data <- structure(list(timestamp = structure(1509301003, class = c("POSIXct",
"POSIXt"), tzone = "America/Chicago"), start_time = "2017-10-29 12:59:00",
start_position_lat = 41.8741511832923, start_position_long = -87.620877083391,
end_position_lat = 41.8724051490426, end_position_long = -87.6218410022557,
total_elapsed_time = 55.003, total_timer_time = 55.003, total_distance = 221.85,
total_strides = 75L, total_calories = 11L, enhanced_avg_speed = 14.5188,
avg_speed = 14.5188, enhanced_max_speed = 15.6528, max_speed = 15.6528,
total_ascent = 0L, total_descent = 0L, event = "lap", event_type = "stop",
avg_heart_rate = 137L, max_heart_rate = 154L, avg_running_cadence = 82L,
max_running_cadence = 86L, lap_trigger = "session_end", sub_sport = "generic",
avg_fractional_cadence = 0.265625, max_fractional_cadence = 0.5,
total_fractional_cycles = NA, avg_vertical_oscillation = NA,
avg_temperature = NA, max_temperature = NA, timestamp_utc = "2017-10-29 18:16:43",
timezone = "America/Chicago", timestamp_previous = structure(NA, class = c("POSIXct",
"POSIXt"), tzone = ""), lap_id = 1L), row.names = c(NA, -1L
), class = "data.frame")
lap_data %>%
inner_join(
start_shifted,by=character()
)
这是错误文本:
Error in `datetime_validate()`:
! Corrupt `POSIXct` with unknown type logical.
ℹ In file type-date-time.c at line 387.
ℹ Install the winch package to get additional debugging info the next time you get this error.
ℹ This is an internal error in the rlang package, please report it to the package authors.
Backtrace:
▆
1. ├─lap_data %>% mutate(dummy = TRUE) %>% ...
2. ├─dplyr::inner_join(...)
3. ├─dplyr:::inner_join.data.frame(...)
4. │ └─dplyr:::join_mutate(...)
5. │ ├─tibble::as_tibble(x, .name_repair = "minimal")
6. │ └─tibble:::as_tibble.data.frame(x, .name_repair = "minimal")
7. │ └─tibble:::lst_to_tibble(unclass(x), .rows, .name_repair)
8. │ └─tibble:::check_valid_cols(x)
9. │ ├─base::which(!map_lgl(x, is_valid_col))
10. │ └─tibble:::map_lgl(x, is_valid_col)
11. │ └─tibble:::map_mold(.x, .f, logical(1), ...)
12. │ └─base::vapply(.x, .f, .mold, ..., USE.NAMES = FALSE)
13. │ └─tibble FUN(X[[i]], ...)
14. │ └─vctrs::vec_is(x)
15. │ └─vctrs:::vec_is_vector(x)
16. │ └─vctrs `<fn>`()
17. │ └─vctrs::vec_proxy(x = x)
18. │ └─vctrs:::datetime_validate(x)
19. └─rlang:::stop_internal_c_lib(...)
20. └─rlang::abort(message, call = call, .internal = TRUE)
更具体的平台细节:
> R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
与其说这是一个解决方案,不如说是一个问题的清晰识别。
如果对每个列进行迭代,例如
inner_join(lap_data , start_shifted, by = character()) # fails
inner_join(lap_data[,1:10], start_shifted, by = character()) # succeeds
inner_join(lap_data[,1:20], start_shifted, by = character()) # succeeds
我们将最终确定timestamp_previous
(列34)是问题列。这是NA
和空的"tzone"
,让我们看看哪个是问题。
### refresh lap_data
lap_data[[34]] <- Sys.time()
inner_join(lap_data, start_shifted, by = character()) # succeeds
所以这不是POSIXt
本身的问题。让我们看看不同类型的NA
(从每次作业/测试的新版本lap_data
开始)。最重要的是,每一个都继承了inner_join(..)
。
### refresh lap_data
# logical NA
lap_data[[34]] <- NA # inner_join(lap_data, start_shifted, by = character()) succeeds
lap_data[[34]] <- NA_real_
lap_data[[34]] <- NA_character_
lap_data[[34]] <- Sys.time()[NA]
所有的成功。它是最后一个,我有点奇怪,因为原timestamp_previous
也是POSIXt
NA
,所以…
dput(Sys.time()[NA])
# structure(NA_real_, class = c("POSIXct", "POSIXt"))
dput(lap_data[[34]])
# structure(NA, class = c("POSIXct", "POSIXt"), tzone = "")
我花了很长时间才找到这里的钥匙,但是……您的timestamp_previous
中有NA
的logical
变体,这是不应该的。然而,您正在获取/转换/创建该字段有一个缺陷。
明确我们所拥有的(第一)和我们所需要的(第二):
dput(lap_data[[34]])
# structure(NA, class = c("POSIXct", "POSIXt"))
dput(as.POSIXct(as.numeric(lap_data[[34]])))
# structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
进一步,如果我们从结构体中剥离class=
,
class(unclass(lap_data[[34]]))
# [1] "logical"
class(unclass(Sys.time()))
# [1] "numeric"
这次旅行的决心:
修复代码。在某个地方有一些代码正在"破坏"。您的
POSIXt
列是合乎逻辑的-NA
与class(..) <- c("POSIXct","POSIXt")
,无论是字面上还是隐藏。对于这个数据的快速修复,这里是那一列的修复:
lap_data[[34]] <- as.POSIXct(unclass(lap_data[[34]])) inner_join(lap_data, start_shifted, by = character()) # succeeds
如果你有多个列,这里有一个修复:
fixpsx <- function(x, origin = "1970-01-01", tzone = "") { if (inherits(x, "POSIXt") && inherits(unclass(x), "logical")) as.POSIXct(unclass(x), origin = origin, tzone = tzone) else x } lap_data[] <- lapply(lap_data, fixpsx) inner_join(lap_data, start_shifted, by = character()) # succeeds