如何在单个变量上应用R中的滚动t.test



我有一个带有两列(日期和计数(的data.frame(df(,如下所示:

Date       Count
1/1/2022    5   
1/2/2022    13  
1/3/2022    21  
1/4/2022    29  
1/5/2022    37  
1/6/2022    45  
1/7/2022    53  
1/8/2022    61  
1/9/2022    69  
1/10/2022   77  
1/11/2022   85  
1/12/2022   93  
1/13/2022   101 
1/14/2022   109 
1/15/2022   117 

由于我有一个变量(计数(,所以我的想法是确定平均值是否每三天发生一次变化,因此我想在3天的窗口内应用滚动t.test,并将结果p值保存在count列旁边,我可以稍后绘制该列。由于我看到人们通常用两个变量做这类测试,我不知道如何用一个变量做。

例如,我在这里看到了这个相关的答案:

ttestFun <- function(dat) {
myTtest = t.test(x = dat[, 1], y = dat[, 2])
return(myTtest$p.value)
}
rollapply(df_ts, 7, FUN = ttestFun, fill = NA, by.column = FALSE)

但同样,这是有两列的。有什么指导吗?

无论是否讨论该方法的有用性,给定固定数量的测量值3,您都可以将计数移动3,并在两列之间执行t检验,如示例中所示,例如:

library(data.table)
set.seed(123)
dates <- seq(as.POSIXct("2022-01-01"), as.POSIXct("2022-02-01"), by = "1 day")
dt <- data.table(Date=dates, count = sample(1:200, length(dates), replace=TRUE), key="Date")
dt[, nxt:=shift(count, 3, type = "lead")]
dt[, group:=rep(1:ceiling(length(dates)/3), each=3)[seq_along(dates)]]
dt[, p:= tryCatch(t.test(count, nxt)$p.value, error=function(e) NA), by="group"][]
#>           Date count nxt group         p
#>  1: 2022-01-01   159 195     1 0.7750944
#>  2: 2022-01-02   179 170     1 0.7750944
#>  3: 2022-01-03    14  50     1 0.7750944
#>  4: 2022-01-04   195 118     2 0.2240362
#>  5: 2022-01-05   170  43     2 0.2240362
#>  6: 2022-01-06    50  14     2 0.2240362
#>  7: 2022-01-07   118 118     3 0.1763296
#>  8: 2022-01-08    43 153     3 0.1763296
#>  9: 2022-01-09    14  90     3 0.1763296
#> 10: 2022-01-10   118  91     4 0.8896343
#> 11: 2022-01-11   153 197     4 0.8896343
#> 12: 2022-01-12    90  91     4 0.8896343
#> 13: 2022-01-13    91 185     5 0.8065021
#> 14: 2022-01-14   197  92     5 0.8065021
#> 15: 2022-01-15    91 137     5 0.8065021
#> 16: 2022-01-16   185  99     6 0.1060465
#> 17: 2022-01-17    92  72     6 0.1060465
#> 18: 2022-01-18   137  26     6 0.1060465
#> 19: 2022-01-19    99   7     7 0.5283156
#> 20: 2022-01-20    72 170     7 0.5283156
#> 21: 2022-01-21    26 137     7 0.5283156
#> 22: 2022-01-22     7 164     8 0.9612965
#> 23: 2022-01-23   170  78     8 0.9612965
#> 24: 2022-01-24   137  81     8 0.9612965
#> 25: 2022-01-25   164  43     9 0.6111337
#> 26: 2022-01-26    78 103     9 0.6111337
#> 27: 2022-01-27    81 117     9 0.6111337
#> 28: 2022-01-28    43  76    10 0.6453494
#> 29: 2022-01-29   103 143    10 0.6453494
#> 30: 2022-01-30   117  NA    10 0.6453494
#> 31: 2022-01-31    76  NA    11        NA
#> 32: 2022-02-01   143  NA    11        NA
#>           Date count nxt group         p

创建于2022-04-07由reprex包(v2.0.1(

你可以进一步清理,例如,通过每个小组的第一次约会:

dt[, .(Date=Date[1], count=round(mean(count), 2), p=p[1]), by="group"]
#>     group       Date  count         p
#>  1:     1 2022-01-01 117.33 0.7750944
#>  2:     2 2022-01-04 138.33 0.2240362
#>  3:     3 2022-01-07  58.33 0.1763296
#>  4:     4 2022-01-10 120.33 0.8896343
#>  5:     5 2022-01-13 126.33 0.8065021
#>  6:     6 2022-01-16 138.00 0.1060465
#>  7:     7 2022-01-19  65.67 0.5283156
#>  8:     8 2022-01-22 104.67 0.9612965
#>  9:     9 2022-01-25 107.67 0.6111337
#> 10:    10 2022-01-28  87.67 0.6453494
#> 11:    11 2022-01-31 109.50        NA

您可以创建一个grp,然后简单地对每个连续的组对应用一个t.test:

d <- d %>% mutate(grp=rep(1:(n()/3), each=3))
d %>% left_join(
tibble(grp = 2:max(d$grp),
pval = sapply(2:max(d$grp), function(x) {
t.test(d %>% filter(grp==x) %>% pull(Count),
d %>% filter(grp==x-1) %>% pull(Count))$p.value
})
)) %>% group_by(grp) %>% slice_min(Date)

输出:(p值是恒定的,只是因为您提供的示例数据(

Date       Count   grp    pval
<date>     <dbl> <int>   <dbl>
1 2022-01-01     5     1 NA     
2 2022-01-04    29     2  0.0213
3 2022-01-07    53     3  0.0213
4 2022-01-10    77     4  0.0213
5 2022-01-13   101     5  0.0213

或者数据表方法:

setDT(d)[, `:=`(grp=rep(1:(nrow(d)/3), each=3),cy=shift(Count,3))] %>% 
.[!is.na(cy), pval:=t.test(Count,cy)$p.value, by=grp] %>% 
.[,.SD[1], by=grp, .SDcols=!c("cy")]

输出:

grp       Date Count       pval
<int>     <Date> <num>      <num>
1:     1 2022-01-01     5         NA
2:     2 2022-01-04    29 0.02131164
3:     3 2022-01-07    53 0.02131164
4:     4 2022-01-10    77 0.02131164
5:     5 2022-01-13   101 0.02131164

最新更新