r语言 - 如何将标题中带有'date'的任何列格式化为日期



我已经将excel电子表格导入R中,数据框中有许多列的标题中有日期。我可以将命名列格式化为aa日期,如下所示:

df$date <- as.Date(as.numeric(df$date), origin = "1899-12-30")

如何对标题中包含"日期"的所有列执行此操作?这里有一个示例数据帧,尽管它的列数远没有实际的那么多。理想情况下,答案是使用dplyr。

df <- structure(list(source = c("Track", "Track", "Track", "Track", 
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

这里有一种替代方法。janitor包对此excel_numeric_to_date:有自己的功能

library(dplyr)
library(janitor)
df %>% 
mutate(across(contains("date"), ~excel_numeric_to_date(as.numeric(.))))
source sample_type collection_date collection_date2 received_date storage_date
<chr>  <chr>       <date>          <date>           <date>        <date>      
1 Track  SQC         2008-07-17      2008-07-17       2008-07-22    2008-07-24  
2 Track  DNA         2008-07-25      2008-07-25       2008-07-31    2008-10-23  
3 Track  PBMC        2008-07-14      2008-07-14       2008-08-25    2008-08-25  
4 Track  PBMC        2008-07-15      2008-07-15       2008-08-25    2008-08-25  
5 Track  PBMC        2010-07-30      2010-07-30       2010-08-31    2010-08-31  

我们可以使用acrosscontains来选择所有包含字符串"日期";。

library(tidyverse)
df <- structure(list(source = c("Track", "Track", "Track", "Track", 
"Track"), sample_type = c("SQC", "DNA", "PBMC", "PBMC", "PBMC"
), collection_date = c("39646", "39654", "39643", "39644", "40389"
), collection_date2 = c("39646", "39654", "39643", "39644", "40389"
), received_date = c("39651", "39660", "39685", "39685", "40421"
), storage_date = c("39653", "39744", "39685", "39685", "40421"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
df <- df %>%
mutate(across(contains("date"), ~as.Date(as.numeric(.x), origin = "1899-12-30")))
head(df)
#> # A tibble: 5 × 6
#>   source sample_type collection_date collection_date2 received_date storage_date
#>   <chr>  <chr>       <date>          <date>           <date>        <date>      
#> 1 Track  SQC         2008-07-17      2008-07-17       2008-07-22    2008-07-24  
#> 2 Track  DNA         2008-07-25      2008-07-25       2008-07-31    2008-10-23  
#> 3 Track  PBMC        2008-07-14      2008-07-14       2008-08-25    2008-08-25  
#> 4 Track  PBMC        2008-07-15      2008-07-15       2008-08-25    2008-08-25  
#> 5 Track  PBMC        2010-07-30      2010-07-30       2010-08-31    2010-08-31

我会为此使用一个循环:

for (col in grep('date', names(df))) {
df[[col]] <- as.Date(as.numeric(df[[col]]), origin="1899-12-30")
}

最新更新