我已经将excel电子表格导入R中,数据框中有许多列的标题中有日期。我可以将命名列格式化为aa日期,如下所示:
df$date <- as.Date(as.numeric(df$date), origin = "1899-12-30")
如何对标题中包含"日期"的所有列执行此操作?这里有一个示例数据帧,尽管它的列数远没有实际的那么多。理想情况下,答案是使用dplyr。
df <- structure(list(source = c("Track", "Track", "Track", "Track",
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
这里有一种替代方法。janitor
包对此excel_numeric_to_date
:有自己的功能
library(dplyr)
library(janitor)
df %>%
mutate(across(contains("date"), ~excel_numeric_to_date(as.numeric(.))))
source sample_type collection_date collection_date2 received_date storage_date
<chr> <chr> <date> <date> <date> <date>
1 Track SQC 2008-07-17 2008-07-17 2008-07-22 2008-07-24
2 Track DNA 2008-07-25 2008-07-25 2008-07-31 2008-10-23
3 Track PBMC 2008-07-14 2008-07-14 2008-08-25 2008-08-25
4 Track PBMC 2008-07-15 2008-07-15 2008-08-25 2008-08-25
5 Track PBMC 2010-07-30 2010-07-30 2010-08-31 2010-08-31
我们可以使用across
和contains
来选择所有包含字符串"日期";。
library(tidyverse)
df <- structure(list(source = c("Track", "Track", "Track", "Track",
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
df <- df %>%
mutate(across(contains("date"), ~as.Date(as.numeric(.x), origin = "1899-12-30")))
head(df)
#> # A tibble: 5 × 6
#> source sample_type collection_date collection_date2 received_date storage_date
#> <chr> <chr> <date> <date> <date> <date>
#> 1 Track SQC 2008-07-17 2008-07-17 2008-07-22 2008-07-24
#> 2 Track DNA 2008-07-25 2008-07-25 2008-07-31 2008-10-23
#> 3 Track PBMC 2008-07-14 2008-07-14 2008-08-25 2008-08-25
#> 4 Track PBMC 2008-07-15 2008-07-15 2008-08-25 2008-08-25
#> 5 Track PBMC 2010-07-30 2010-07-30 2010-08-31 2010-08-31
我会为此使用一个循环:
for (col in grep('date', names(df))) {
df[[col]] <- as.Date(as.numeric(df[[col]]), origin="1899-12-30")
}