r-转换data.table的多个日期列



我正在尝试转换data.table中的多个日期列(使用不同的格式(。现有的方法很少。其中一个链接有效地转换了data.table中的日期列。我正在尝试使用mapply。但是得到了以下错误:

[.data.table中的错误(df,:=((paste0(dtVar,">"((,mapply(函数(x,:提供12个项目,分配给6个项目列'X1'的。如果您希望"回收"RHS,请使用rep((让代码的读者清楚地了解这个意图。

library(data.table)
# sample data
df <- data.table(
X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
stringsAsFactors = FALSE)

# convert date columns
dtVar <- c("X1", "X2")
inDtFmt <- c("%Y-%m-%d","%d/%m/%Y")
df[,(dtVar) := mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)]
## Further investigation
mm <- mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)
str(mm)
List of 2
# $ X1: POSIXlt[1:6], format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: POSIXlt[1:6], format: "1996-01-02" "1996-01-03" "1996-01-04" "1996-01-05" ...

有人能告诉我为什么会出现这个错误吗?

mapply通常试图将结果简化为向量,您应该使用Mapstrptime返回类POSIXlt的对象,这里只需要日期,所以使用as.Date

此外,如果使用lubridate::parse_date_time,则可以使用lapply来完成此操作。

library(data.table)
df[, (dtVar) := lapply(.SD, lubridate::parse_date_time, inDtFmt), .SDcols = dtVar]
df
#           X1         X2
#1: 1996-01-04 1996-01-02
#2: 1996-01-05 1996-01-03
#3: 1996-01-08 1996-01-04
#4: 1996-01-09 1996-01-05
#5: 1996-01-10 1996-01-08
#6: 1996-01-11 1996-01-09

您可以使用as.IDate:

df <- df[,as.list(Map(function(x,y){as.IDate(.SD[[x]], format = y)}, dtVar, inDtFmt))]
print(df)
X1         X2
1: 1996-01-04 1996-01-02
2: 1996-01-05 1996-01-03
3: 1996-01-08 1996-01-04
4: 1996-01-09 1996-01-05
5: 1996-01-10 1996-01-08
6: 1996-01-11 1996-01-09

我们可以使用anytime中的anydate,它可以自动拾取格式并更改为Date

library(data.table)
library(anytime)
df[, (dtVar) := lapply(.SD, anydate), .SDcols = dtVar]
str(df)
#Classes ‘data.table’ and 'data.frame': 6 obs. of  2 variables:
# $ X1: Date, format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: Date, format: "1996-02-01" "1996-03-01" "1996-04-01" "1996-05-01" ...

这里有一种方法:


library(data.table)
# sample data
df <- data.table(
X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
stringsAsFactors = FALSE)
str(df)
dtFmt <- list(X1 = "%Y-%m-%d", X2 = "%d/%m/%Y")
for (col in names(df)) {
df[[col]] <- as.Date(df[[col]],  dtFmt[[col]]) 
}
str(df)

最新更新