我在三列中有日期数据,如下所示:
Year | Month | Week |
---|---|---|
2019年4月 | 1 | |
2019年4月 | 2 | |
2019年4月 | 3 | |
2019年5月 | 4 |
创建一个函数来查找中点,如下所示(如果您的样本数据为my_df
(:
library(tidyverse)
library(lubridate)
# <-- Function to find midpoint of week -->
midweek <- (year, month, month_week) {
# first day of the month
date <- paste(year, month, "1st") |> lubridate::ymd()
# return middle date of the month week
date + 7 * month_week - 4
}
my_df |> mutate(
Week = gsub("wk ", "", Week) |> as.integer(),
midpoint = midweek(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year midpoint
#> <chr> <int> <chr> <date>
#> 1 May 4 2016 2016-05-25
#> 2 May 4 2007 2007-05-25
#> 3 August 2 2010 2010-08-11
#> 4 April 4 1991 1991-04-25
#> 5 May 4 2012 2012-05-25
#> 6 May 5 1990 1990-06-01
#> 7 July 4 2011 2011-07-25
#> 8 August 2 2014 2014-08-11
#> 9 July 3 2020 2020-07-18
#> 10 May 4 2011 2011-05-25
#> 11 July 1 2010 2010-07-04
#> 12 May 3 1992 1992-05-18
#> 13 July 3 2017 2017-07-18
#> 14 April 3 2020 2020-04-18
#> 15 May 2 2014 2014-05-11
#> 16 July 3 1996 1996-07-18
#> 17 May 5 2012 2012-06-01
#> 18 April 4 1995 1995-04-25
#> 19 August 3 2018 2018-08-18
#> 20 July 1 2019 2019-07-04
下面的方法应该可以做到。不过这很有技巧。
shift_to_middle_of_week <- function(Year, Month, Week) {
Week = str_extract(Week, "\d+") %>% as.numeric()
date_first_week = (paste(Year, Month, "1", sep = "-")) %>% ymd()
dow = wday(date_first_week)
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
)
shift = floor((7-dow)/2)
ceil = ceiling_date(date_first_week, unit = "month")-1
diff_ceil_week_start = ceil - week_start
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift)
return(week_start + shift)
}
df %>%
mutate(
date_mid_week = shift_to_middle_of_week(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year date_mid_week
#> <chr> <chr> <chr> <date>
#> 1 May wk 4 2016 2016-05-25
#> 2 May wk 4 2007 2007-05-22
#> 3 August wk 2 2010 2010-08-11
#> 4 April wk 4 1991 1991-04-23
#> 5 May wk 4 2012 2012-05-22
#> 6 May wk 5 1990 1990-05-29
#> 7 July wk 4 2011 2011-07-17
#> 8 August wk 2 2014 2014-08-03
#> 9 July wk 3 2020 2020-07-13
#> 10 May wk 4 2011 2011-05-25
#> 11 July wk 1 2010 2010-07-02
#> 12 May wk 3 1992 1992-05-10
#> 13 July wk 3 2017 2017-07-09
#> 14 April wk 3 2020 2020-04-13
#> 15 May wk 2 2014 2014-05-05
#> 16 July wk 3 1996 1996-07-16
#> 17 May wk 5 2012 2012-05-29
#> 18 April wk 4 1995 1995-04-16
#> 19 August wk 3 2018 2018-08-13
#> 20 July wk 1 2019 2019-07-03
更多细节
将所有内容放入mutate()
中以查看中间步骤:
df %>% mutate(
Week = str_extract(Week, "\d+") %>% as.numeric(),
date_first_week = (paste(Year, Month, "01", sep = "-")) %>% ymd(),
dow = wday(date_first_week),
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
),
week_start_2 = date_first_week + ((Week-1)*7)-dow+1,
shift = floor((7-dow)/2),
ceil = ceiling_date(date_first_week, unit = "month")-1,
diff_ceil_week_start = ceil - week_start,
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift),
week_mid = week_start + shift
)
# A tibble: 20 x 11
Month Week Year date_first_week dow week_start week_start_2 shift ceil diff_ceil_week_start week_mid
<chr> <dbl> <chr> <date> <dbl> <date> <date> <drtn> <date> <drtn> <date>
1 May 4 2016 2016-05-01 1 2016-05-22 2016-05-22 3 days 2016-05-31 9 days 2016-05-25
2 May 4 2007 2007-05-01 3 2007-05-20 2007-05-20 2 days 2007-05-31 11 days 2007-05-22
3 August 2 2010 2010-08-01 1 2010-08-08 2010-08-08 3 days 2010-08-31 23 days 2010-08-11
4 April 4 1991 1991-04-01 2 1991-04-21 1991-04-21 2 days 1991-04-30 9 days 1991-04-23
5 May 4 2012 2012-05-01 3 2012-05-20 2012-05-20 2 days 2012-05-31 11 days 2012-05-22
6 May 5 1990 1990-05-01 3 1990-05-27 1990-05-27 2 days 1990-05-31 4 days 1990-05-29
7 July 4 2011 2011-07-01 6 2011-07-17 2011-07-17 0 days 2011-07-31 14 days 2011-07-17
8 August 2 2014 2014-08-01 6 2014-08-03 2014-08-03 0 days 2014-08-31 28 days 2014-08-03
9 July 3 2020 2020-07-01 4 2020-07-12 2020-07-12 1 days 2020-07-31 19 days 2020-07-13
10 May 4 2011 2011-05-01 1 2011-05-22 2011-05-22 3 days 2011-05-31 9 days 2011-05-25
11 July 1 2010 2010-07-01 5 2010-07-01 2010-06-27 1 days 2010-07-31 30 days 2010-07-02
12 May 3 1992 1992-05-01 6 1992-05-10 1992-05-10 0 days 1992-05-31 21 days 1992-05-10
13 July 3 2017 2017-07-01 7 2017-07-09 2017-07-09 0 days 2017-07-31 22 days 2017-07-09
14 April 3 2020 2020-04-01 4 2020-04-12 2020-04-12 1 days 2020-04-30 18 days 2020-04-13
15 May 2 2014 2014-05-01 5 2014-05-04 2014-05-04 1 days 2014-05-31 27 days 2014-05-05
16 July 3 1996 1996-07-01 2 1996-07-14 1996-07-14 2 days 1996-07-31 17 days 1996-07-16
17 May 5 2012 2012-05-01 3 2012-05-27 2012-05-27 2 days 2012-05-31 4 days 2012-05-29
18 April 4 1995 1995-04-01 7 1995-04-16 1995-04-16 0 days 1995-04-30 14 days 1995-04-16
19 August 3 2018 2018-08-01 4 2018-08-12 2018-08-12 1 days 2018-08-31 19 days 2018-08-13
20 July 1 2019 2019-07-01 2 2019-07-01 2019-06-30 2 days 2019-07-31 30 days 2019-07-03