r-给定一系列日期和出生日期,有没有办法使用lubridate软件包获得每个日期输入的年龄以及最终年龄



我有一个关于随时间观察到的个体的信息数据库。无论何时记录,我都想找到一种方法来获取这些人的年龄。假设BIRTH指定的值为0,我希望获得之后访问的年龄(以天或月为单位(。获得每个人的最终年龄(天或月(也很有帮助(*不包括在代码中(。例如,ID(A(的最终年龄为10个月。我想使用lubridate函数,因为它的内置日期功能使处理日期更容易。如有任何帮助,我们将不胜感激。

date<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
"2001-02-04","2001-06-15","2001-12-26","2002-05-22","2002-06-04",
"2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID<-c("A","A","A","A","A","A","A",
"B","B","B","B","B",
"C","C","C","C")
status<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC")
df1<-data.frame(date,ID,status)
print(df1)
date ID status
1  2000-01-01  A  BIRTH
2  2000-01-14  A    ETC
3  2000-01-25  A    ETC
4  2000-02-12  A    ETC
5  2000-02-27  A    ETC
6  2000-06-05  A    ETC
7  2000-10-30  A    ETC
8  2001-02-04  B  BIRTH
9  2001-06-15  B    ETC
10 2001-12-26  B    ETC
11 2002-05-22  B    ETC
12 2002-06-04  B    ETC
13 2000-01-08  C  BIRTH
14 2000-07-11  C    ETC
15 2000-08-18  C    ETC
16 2000-11-27  C    ETC
date.new<-c("2000-01-01","2000-01-14","2000-01-25","2000-02-12","2000-02-27","2000-06-05","2000-10-30",
"2001-02-04","2001-06-15","2001-12-26","2002-05-22","2001-02-04",
"2000-01-08","2000-07-11","2000-08-18","2000-11-27")
ID.new<-c("A","A","A","A","A","A","A",
"B","B","B","B","B",
"C","C","C","C")
status.new<-c("BIRTH","ETC","ETC","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC")
age<-c(0,1,1,2,2,6,10,
0,4,10,15,16,
0,6,7,10)
df2<-data.frame(date.new,ID.new,status.new,age)
print(df2)
date.new ID.new status.new age
1  2000-01-01      A      BIRTH   0
2  2000-01-14      A        ETC   1
3  2000-01-25      A        ETC   1
4  2000-02-12      A        ETC   2
5  2000-02-27      A        ETC   2
6  2000-06-05      A        ETC   6
7  2000-10-30      A        ETC  10
8  2001-02-04      B      BIRTH   0
9  2001-06-15      B        ETC   4
10 2001-12-26      B        ETC  10
11 2002-05-22      B        ETC  15
12 2001-02-04      B        ETC  16
13 2000-01-08      C      BIRTH   0
14 2000-07-11      C        ETC   6
15 2000-08-18      C        ETC   7
16 2000-11-27      C        ETC  10

对于以年或月为单位的与年龄相关的计算,我鼓励您尝试时钟包,而不是lubridate。lubridate是一个很棒的软件包,但如果你不能100%确定自己在做什么,那么通过这些计算会产生一些意想不到的结果。在时钟中,执行此操作的函数是date_count_between()。请注意,clock和lubridate之间的结果之一不同:

library(clock)
library(lubridate, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
date = c("2000-01-01","2000-01-14",
"2000-01-25","2000-02-12","2000-02-27","2000-06-05",
"2000-10-30","2001-02-04","2001-06-15","2001-12-26",
"2002-05-22","2002-06-04","2000-01-08","2000-07-11",
"2000-08-18","2000-11-27"),
ID = c("A","A","A","A","A","A",
"A","B","B","B","B","B","C","C","C","C"),
status = c("BIRTH","ETC","ETC","ETC",
"ETC","ETC","ETC","BIRTH","ETC","ETC","ETC","ETC",
"BIRTH","ETC","ETC","ETC")
)
df %>% 
mutate(date = date_parse(date)) %>% 
group_by(ID) %>% 
mutate(birth_date = date[status == "BIRTH"]) %>% 
ungroup() %>%
mutate(
age_clock = date_count_between(birth_date, date, "month"),
age_lubridate = as.period(date - birth_date) %/% months(1))
#> # A tibble: 16 × 6
#>    date       ID    status birth_date age_clock age_lubridate
#>    <date>     <chr> <chr>  <date>         <int>         <dbl>
#>  1 2000-01-01 A     BIRTH  2000-01-01         0             0
#>  2 2000-01-14 A     ETC    2000-01-01         0             0
#>  3 2000-01-25 A     ETC    2000-01-01         0             0
#>  4 2000-02-12 A     ETC    2000-01-01         1             1
#>  5 2000-02-27 A     ETC    2000-01-01         1             1
#>  6 2000-06-05 A     ETC    2000-01-01         5             5
#>  7 2000-10-30 A     ETC    2000-01-01         9             9
#>  8 2001-02-04 B     BIRTH  2001-02-04         0             0
#>  9 2001-06-15 B     ETC    2001-02-04         4             4
#> 10 2001-12-26 B     ETC    2001-02-04        10            10
#> 11 2002-05-22 B     ETC    2001-02-04        15            15
#> 12 2002-06-04 B     ETC    2001-02-04        16            15
#> 13 2000-01-08 C     BIRTH  2000-01-08         0             0
#> 14 2000-07-11 C     ETC    2000-01-08         6             6
#> 15 2000-08-18 C     ETC    2000-01-08         7             7
#> 16 2000-11-27 C     ETC    2000-01-08        10            10

时钟上说2001-02-042002-06-04是16个月,而这里的lubridate方法只说是15个月。这与lubridate计算使用平均月的长度有关,这并不总是准确反映我们对月的看法。

举个简单的例子,我想大多数人都会同意,在二月的这个日子出生的孩子被认为是";1个月零1天";老的但卢布里达显示0个月!

library(clock)
library(lubridate, warn.conflicts = FALSE)
# "1 month and 1 day apart"
feb <- as.Date("2020-02-28")
mar <- as.Date("2020-03-29")
# As expected when thinking about age in months
date_count_between(feb, mar, "month")
#> [1] 1
# Not expected
as.period(mar - feb) %/% months(1)
#> [1] 0
secs_in_day <- 86400
secs_in_month <- as.numeric(months(1))
secs_in_month / secs_in_day
#> [1] 30.4375
# Less than 30.4375 days, so not 1 month
mar - feb
#> Time difference of 30 days

问题是lubridate在计算中使用了平均月的长度,即30.4375天。但这两个日期之间只有30天,所以这不被认为是一个完整的月。

另一方面,时钟使用开始日期的日期分量来确定是否"是";全月";是否已通过。换句话说,因为我们已经过了3月28日,时钟决定1个月已经过去,这与我们通常对年龄的看法一致。

使用dplyrlubridate,我们可以执行以下操作。我们首先将date列转换为日期。然后,我们按ID分组,找到出生日期,并通过一些lubridate魔术计算自该日期以来的月数(请参阅如何使用lubridate包计算两个日期矢量之间的月数,其中一个矢量具有NA值?(。

library(dplyr)
library(lubridate)
df1 %>% 
mutate(date = as_date(date)) %>% 
group_by(ID) %>% 
mutate(birth_date = date[status == "BIRTH"],
age = as.period(date - birth_date) %/% months(1)) %>% 
ungroup()

哪个给出:

date       ID    status birth_date   age
<date>     <fct> <fct>  <date>     <dbl>
1 2000-01-01 A     BIRTH  2000-01-01     0
2 2000-01-14 A     ETC    2000-01-01     0
3 2000-01-25 A     ETC    2000-01-01     0
4 2000-02-12 A     ETC    2000-01-01     1
5 2000-02-27 A     ETC    2000-01-01     1
6 2000-06-05 A     ETC    2000-01-01     5
7 2000-10-30 A     ETC    2000-01-01     9
8 2001-02-04 B     BIRTH  2001-02-04     0
9 2001-06-15 B     ETC    2001-02-04     4
10 2001-12-26 B     ETC    2001-02-04    10
11 2002-05-22 B     ETC    2001-02-04    15
12 2002-06-04 B     ETC    2001-02-04    15
13 2000-01-08 C     BIRTH  2000-01-08     0
14 2000-07-11 C     ETC    2000-01-08     6
15 2000-08-18 C     ETC    2000-01-08     7
16 2000-11-27 C     ETC    2000-01-08    10

除了一些舍入差异外,这是您的预期输出。请参阅我对您问题的评论。

相关内容

最新更新