r-在给定年份、月份和月份的周数的情况下,找到每个观测的中间日期



我在三列中有日期数据,如下所示:

2019年4月2019年5月1日2019年5月2019年5月
Year Month Week
2019年4月 1
2019年4月 2
2019年4月 3
2019年5月4

创建一个函数来查找中点,如下所示(如果您的样本数据为my_df(:

library(tidyverse)
library(lubridate)
# <-- Function to find midpoint of week -->
midweek <- (year, month, month_week) {
# first day of the month
date <- paste(year, month, "1st") |> lubridate::ymd()

# return middle date of the month week
date + 7 * month_week - 4
}

my_df |> mutate(
Week = gsub("wk ", "", Week) |> as.integer(), 
midpoint = midweek(Year, Month, Week)
)
#> # A tibble: 20 x 4
#>    Month   Week Year  midpoint  
#>    <chr>  <int> <chr> <date>    
#>  1 May        4 2016  2016-05-25
#>  2 May        4 2007  2007-05-25
#>  3 August     2 2010  2010-08-11
#>  4 April      4 1991  1991-04-25
#>  5 May        4 2012  2012-05-25
#>  6 May        5 1990  1990-06-01
#>  7 July       4 2011  2011-07-25
#>  8 August     2 2014  2014-08-11
#>  9 July       3 2020  2020-07-18
#> 10 May        4 2011  2011-05-25
#> 11 July       1 2010  2010-07-04
#> 12 May        3 1992  1992-05-18
#> 13 July       3 2017  2017-07-18
#> 14 April      3 2020  2020-04-18
#> 15 May        2 2014  2014-05-11
#> 16 July       3 1996  1996-07-18
#> 17 May        5 2012  2012-06-01
#> 18 April      4 1995  1995-04-25
#> 19 August     3 2018  2018-08-18
#> 20 July       1 2019  2019-07-04

下面的方法应该可以做到。不过这很有技巧。

shift_to_middle_of_week <- function(Year, Month, Week) {

Week = str_extract(Week, "\d+") %>% as.numeric()
date_first_week = (paste(Year, Month, "1", sep = "-")) %>% ymd()
dow = wday(date_first_week)

week_start = if_else(
Week == 1, 
date_first_week, 
date_first_week + ((Week-1)*7)-dow+1
) 

shift = floor((7-dow)/2)

ceil = ceiling_date(date_first_week, unit = "month")-1
diff_ceil_week_start = ceil - week_start

shift = if_else(diff_ceil_week_start<6, 
floor(diff_ceil_week_start/2), 
shift)
return(week_start + shift)
}
df %>% 
mutate(
date_mid_week = shift_to_middle_of_week(Year, Month, Week)
)
#> # A tibble: 20 x 4
#>    Month  Week  Year  date_mid_week
#>    <chr>  <chr> <chr> <date>       
#>  1 May    wk 4  2016  2016-05-25   
#>  2 May    wk 4  2007  2007-05-22   
#>  3 August wk 2  2010  2010-08-11   
#>  4 April  wk 4  1991  1991-04-23   
#>  5 May    wk 4  2012  2012-05-22   
#>  6 May    wk 5  1990  1990-05-29   
#>  7 July   wk 4  2011  2011-07-17   
#>  8 August wk 2  2014  2014-08-03   
#>  9 July   wk 3  2020  2020-07-13   
#> 10 May    wk 4  2011  2011-05-25   
#> 11 July   wk 1  2010  2010-07-02   
#> 12 May    wk 3  1992  1992-05-10   
#> 13 July   wk 3  2017  2017-07-09   
#> 14 April  wk 3  2020  2020-04-13   
#> 15 May    wk 2  2014  2014-05-05   
#> 16 July   wk 3  1996  1996-07-16   
#> 17 May    wk 5  2012  2012-05-29   
#> 18 April  wk 4  1995  1995-04-16   
#> 19 August wk 3  2018  2018-08-13   
#> 20 July   wk 1  2019  2019-07-03

更多细节

将所有内容放入mutate()中以查看中间步骤:

df %>% mutate(
Week = str_extract(Week, "\d+") %>% as.numeric(),
date_first_week = (paste(Year, Month, "01", sep = "-")) %>% ymd(), 
dow = wday(date_first_week),
week_start = if_else(
Week == 1, 
date_first_week, 
date_first_week + ((Week-1)*7)-dow+1
), 
week_start_2 = date_first_week + ((Week-1)*7)-dow+1,
shift = floor((7-dow)/2),
ceil = ceiling_date(date_first_week, unit = "month")-1, 
diff_ceil_week_start = ceil - week_start,
shift = if_else(diff_ceil_week_start<6, 
floor(diff_ceil_week_start/2), 
shift), 
week_mid = week_start + shift
)
# A tibble: 20 x 11
Month   Week Year  date_first_week   dow week_start week_start_2 shift  ceil       diff_ceil_week_start week_mid  
<chr>  <dbl> <chr> <date>          <dbl> <date>     <date>       <drtn> <date>     <drtn>               <date>    
1 May        4 2016  2016-05-01          1 2016-05-22 2016-05-22   3 days 2016-05-31  9 days              2016-05-25
2 May        4 2007  2007-05-01          3 2007-05-20 2007-05-20   2 days 2007-05-31 11 days              2007-05-22
3 August     2 2010  2010-08-01          1 2010-08-08 2010-08-08   3 days 2010-08-31 23 days              2010-08-11
4 April      4 1991  1991-04-01          2 1991-04-21 1991-04-21   2 days 1991-04-30  9 days              1991-04-23
5 May        4 2012  2012-05-01          3 2012-05-20 2012-05-20   2 days 2012-05-31 11 days              2012-05-22
6 May        5 1990  1990-05-01          3 1990-05-27 1990-05-27   2 days 1990-05-31  4 days              1990-05-29
7 July       4 2011  2011-07-01          6 2011-07-17 2011-07-17   0 days 2011-07-31 14 days              2011-07-17
8 August     2 2014  2014-08-01          6 2014-08-03 2014-08-03   0 days 2014-08-31 28 days              2014-08-03
9 July       3 2020  2020-07-01          4 2020-07-12 2020-07-12   1 days 2020-07-31 19 days              2020-07-13
10 May        4 2011  2011-05-01          1 2011-05-22 2011-05-22   3 days 2011-05-31  9 days              2011-05-25
11 July       1 2010  2010-07-01          5 2010-07-01 2010-06-27   1 days 2010-07-31 30 days              2010-07-02
12 May        3 1992  1992-05-01          6 1992-05-10 1992-05-10   0 days 1992-05-31 21 days              1992-05-10
13 July       3 2017  2017-07-01          7 2017-07-09 2017-07-09   0 days 2017-07-31 22 days              2017-07-09
14 April      3 2020  2020-04-01          4 2020-04-12 2020-04-12   1 days 2020-04-30 18 days              2020-04-13
15 May        2 2014  2014-05-01          5 2014-05-04 2014-05-04   1 days 2014-05-31 27 days              2014-05-05
16 July       3 1996  1996-07-01          2 1996-07-14 1996-07-14   2 days 1996-07-31 17 days              1996-07-16
17 May        5 2012  2012-05-01          3 2012-05-27 2012-05-27   2 days 2012-05-31  4 days              2012-05-29
18 April      4 1995  1995-04-01          7 1995-04-16 1995-04-16   0 days 1995-04-30 14 days              1995-04-16
19 August     3 2018  2018-08-01          4 2018-08-12 2018-08-12   1 days 2018-08-31 19 days              2018-08-13
20 July       1 2019  2019-07-01          2 2019-07-01 2019-06-30   2 days 2019-07-31 30 days              2019-07-03

最新更新