r语言 - 标准化不同类型的日期



我有这种混合类型的日期,我想将所有这些日期标准化为yyyy-mm-dd格式,对于只有月份的日期,我想使用"平均值";月的几号(如15号)。例如,October 28, 2021应该变成2021/10/28,November 2009应该变成2009/11/15。另外,我该如何处理像979号职位那样缺少的日期?

这是一个可复制的例子-

[909] "October 28, 2021"   "April 7, 2014"      "November 2009"      "January 17, 2018"  
[913] "January 2023"       "February 2012"      "December 2022"      "July 1999"         
[917] "November 2006"      "June 2011"          "July 2014"          "January 2015"      
[921] "July 1, 2020"       "October 15, 2018"   "September 27, 2019" "February 14, 2022" 
[925] "June 28, 2021"      "June 2016"          "March 2013"         "October 2014"      
[929] "January 2023"       "July 6, 2022"       "January 2014"       "March 22, 2001"    
[933] "October 10, 2019"   "May 1, 2008"        "December 2008"      "November 2023"     
[937] "August 2005"        "May 1, 2022"        "January 8, 2014"    "July 2011"         
[941] "August 15, 2022"    "May 2004"           "November 2012"      "October 1999"      
[945] "March 2010"         "May 2014"           "October 2006"       "March 1, 2017"     
[949] "June 25, 2019"      "October 2004"       "September 2016"     "June 10, 2019"     
[953] "April 4, 2017"      ""                   "August 30, 2018"    "July 1, 2017"      
[957] "November 14, 2019"  "November 2006"      "September 1, 2022"  "April 2007"        
[961] "July 12, 2013"      "August 14, 2015"    "March 2013"         "January 2014"      
[965] "March 2013"         "June 27, 2019"      "April 2008"         "July 2007"         
[969] "February 2007"      "May 2013"           "April 2011"         "December 2007"     
[973] "July 2007"          "December 2008"      "May 5, 2017"        "December 2007"     
[977] "February 27, 2007"  "February 13, 2018"  ""                   "August 2014"       
[981] "September 9, 2019"  "October 2010"       "January 30, 2013"   "January 2010"      
[985] "September 15, 2015" "March 2006"         "April 2016"         "March 2014"        
[989] "April 2010"         "February 20, 2017"  "October 2015"       "March 2012"        
[993] "December 2014"      "May 4, 2022"        "October 27, 2020"   "September 22, 2017"
[997] "November 2009"      "July 2003"          "August 2006"        "March 3, 2017"

类似于

dates <- c("October 28, 2021", "April 7, 2014", "November 2009")
ddd <- function(d){
if (lengths(strsplit(d, " ")) == 2) {
d <- paste("15 ", d)
}
d <- anytime::anydate(d)
return(d)
}
lapply(dates, ddd)
#> [[1]]
#> [1] "2021-10-28"
#> 
#> [[2]]
#> [1] "2014-04-07"
#> 
#> [[3]]
#> [1] "2009-11-15"

创建于2022-11-18与reprex v2.0.2

缺失的值变成NA,取决于你如何处理它们。

问候,Grzegorz

library(lubridate); library(stringr); library(dplyr)
x <- c("", "October 28, 2021", "April 7, 2014", "November 2009")

if_else(str_length(x) == 0, NA_Date_,
if_else(str_detect(x, ","), mdy(x), my(x,quiet = TRUE)))
[1] NA           "2021-10-28" "2014-04-07"
[4] "2009-11-01"

…一个整齐的方法

相关内容

  • 没有找到相关文章

最新更新