我有一个数据集,包含多个运动员在不同日期/时间的训练数据。其中一列包含会话的日期和开始时间。我只想在本栏中保留开始时间,即我想删除"2020/01/05"one_answers"UTC"。如何删除时间前后的所有内容(有400万行,日期/时间各不相同(。
start.time
1 2020/01/05 21:30:04 UTC
2 2020/01/05 21:30:04 UTC
3 2020/01/05 21:30:04 UTC
4 2020/01/05 21:30:04 UTC
5 2020/01/05 21:30:04 UTC
6 2020/01/05 21:30:04 UTC
抱歉,这个问题可能已经在某个地方得到了回答。
感谢
实现这一点的多种方法:
1( 使用regex
df$time <- sub('.*\s+(.*) UTC', '\1', df$start.time)
df
# start.time time
#1 2020/01/05 21:30:04 UTC 21:30:04
#2 2020/01/05 21:30:04 UTC 21:30:04
#3 2020/01/05 21:30:04 UTC 21:30:04
#4 2020/01/05 21:30:04 UTC 21:30:04
#5 2020/01/05 21:30:04 UTC 21:30:04
#6 2020/01/05 21:30:04 UTC 21:30:04
在这里,我们捕获空白和"UTC"
之间的所有内容。CCD_ 2被用作后参考以捕获提取的值。
2(转换为POSIXct
,然后转换为format
这可以在基本R:中完成
format(as.POSIXct(df$start.time, format = "%Y/%m/%d %T"), "%T")
或使用lubridate
format(lubridate::ymd_hms(df$start.time), "%T")
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))
我们可以从anytime
使用anytime
library(anytime)
format(anytime(df$start.time), "%T")
或使用as.ITime
library(data.table)
as.ITime(df$start.time)
#[1] "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04" "21:30:04"
数据
df <- structure(list(start.time = structure(c(1L, 1L, 1L, 1L, 1L, 1L
), .Label = "2020/01/05 21:30:04 UTC", class = "factor")),
class = "data.frame", row.names = c(NA,-6L))