R-如何总结自ID和大型数据框架以来第一次日期的天数和`的天数



dataframe df1总结了整个时间(ID(的检测(Date(。作为一个简短的例子:

df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2),
                 Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10")))
df1
   ID       Date
1   1 2016-08-21
2   2 2016-08-24
3   1 2016-08-23
4   2 2016-08-29
5   1 2016-08-27
6   2 2016-09-02
7   1 2016-09-01
8   2 2016-09-09
9   1 2016-09-01
10  2 2016-09-10

我想总结Number of days since the first detection of the individual(Ndays(和Number of days that the individual has been detected since the first time it was detected(Ndifdays(。

此外,我想在此摘要表中包含一个称为Prop的变量,该变量仅将Ndifdays划分为Ndays

我期望的摘要表是:

> Result
  ID Ndays Ndifdays  Prop
1  1    11        4 0.360 # Between 21st Aug and 01st Sept there is 11 days.
2  2    17        5 0.294 # Between 24th Aug and 10st Sept there is 17 days.

有人知道该怎么做吗?

您可以使用dplyr

中的各种汇总功能实现
library(dplyr)
df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(max(Date) - min(Date)), 
             Ndifdays = n_distinct(Date), 
             Prop = Ndifdays/Ndays)
#     ID Ndays Ndifdays  Prop
#   <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

data.table版本将是

library(data.table)
df12 <- setDT(df1)[, .(Ndays = as.integer(max(Date) - min(Date)), 
                       Ndifdays = uniqueN(Date)), by = ID]
df12$Prop <- df12$Ndifdays/df12$Ndays

aggregate

的基础r
df12 <- aggregate(Date~ID, df1, function(x) c(max(x) - min(x), length(unique(x))))
df12$Prop <- df1$Ndifdays/df1$Ndays

按'id'分组后,获取'date''的 diffrange创建'ndays',然后用 n_distinct获取唯一的'date'数字,除以数字nd Dicting获得" Prop"

library(dplyr)    
df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(diff(range(Date))), 
         Ndifdays = n_distinct(Date), 
         Prop = Ndifdays/Ndays)
# A tibble: 2 x 4
#     ID Ndays Ndifdays  Prop
#  <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

最新更新