我有这种混合类型的日期,我想将所有这些日期标准化为yyyy-mm-dd
格式,对于只有月份的日期,我想使用"平均值";月的几号(如15号)。例如,October 28, 2021
应该变成2021/10/28
,November 2009
应该变成2009/11/15
。另外,我该如何处理像979号职位那样缺少的日期?
这是一个可复制的例子-
[909] "October 28, 2021" "April 7, 2014" "November 2009" "January 17, 2018"
[913] "January 2023" "February 2012" "December 2022" "July 1999"
[917] "November 2006" "June 2011" "July 2014" "January 2015"
[921] "July 1, 2020" "October 15, 2018" "September 27, 2019" "February 14, 2022"
[925] "June 28, 2021" "June 2016" "March 2013" "October 2014"
[929] "January 2023" "July 6, 2022" "January 2014" "March 22, 2001"
[933] "October 10, 2019" "May 1, 2008" "December 2008" "November 2023"
[937] "August 2005" "May 1, 2022" "January 8, 2014" "July 2011"
[941] "August 15, 2022" "May 2004" "November 2012" "October 1999"
[945] "March 2010" "May 2014" "October 2006" "March 1, 2017"
[949] "June 25, 2019" "October 2004" "September 2016" "June 10, 2019"
[953] "April 4, 2017" "" "August 30, 2018" "July 1, 2017"
[957] "November 14, 2019" "November 2006" "September 1, 2022" "April 2007"
[961] "July 12, 2013" "August 14, 2015" "March 2013" "January 2014"
[965] "March 2013" "June 27, 2019" "April 2008" "July 2007"
[969] "February 2007" "May 2013" "April 2011" "December 2007"
[973] "July 2007" "December 2008" "May 5, 2017" "December 2007"
[977] "February 27, 2007" "February 13, 2018" "" "August 2014"
[981] "September 9, 2019" "October 2010" "January 30, 2013" "January 2010"
[985] "September 15, 2015" "March 2006" "April 2016" "March 2014"
[989] "April 2010" "February 20, 2017" "October 2015" "March 2012"
[993] "December 2014" "May 4, 2022" "October 27, 2020" "September 22, 2017"
[997] "November 2009" "July 2003" "August 2006" "March 3, 2017"
类似于
dates <- c("October 28, 2021", "April 7, 2014", "November 2009")
ddd <- function(d){
if (lengths(strsplit(d, " ")) == 2) {
d <- paste("15 ", d)
}
d <- anytime::anydate(d)
return(d)
}
lapply(dates, ddd)
#> [[1]]
#> [1] "2021-10-28"
#>
#> [[2]]
#> [1] "2014-04-07"
#>
#> [[3]]
#> [1] "2009-11-15"
创建于2022-11-18与reprex v2.0.2
缺失的值变成NA,取决于你如何处理它们。
问候,Grzegorz
library(lubridate); library(stringr); library(dplyr)
x <- c("", "October 28, 2021", "April 7, 2014", "November 2009")
if_else(str_length(x) == 0, NA_Date_,
if_else(str_detect(x, ","), mdy(x), my(x,quiet = TRUE)))
[1] NA "2021-10-28" "2014-04-07"
[4] "2009-11-01"
…一个整齐的方法