r语言 - 使用"rle"函数和"dplyr""group_by"



>我有一个包含三列的数据框,其中包含类似于下面给出的数据框的信息。现在,我希望根据列a中的信息提取信息搜索模式。

基于少数开发人员(@thelatemail 和 @David T(的支持,我能够使用rle函数识别模式,请参阅此处 - 使用 rle 函数识别模式。现在,我希望继续前进,将分组信息添加到提取的模式中。我尝试使用dplyrdo函数 - 请参阅下面的代码。但是,这不起作用。

还给出了示例数据和所需的输出供您参考。

##mycode that produces error - needs to be fixed
test <- data%>%
group_by(b, c)%>%
do(.,  data.frame(from = rle(.$a)$values), to = lead(rle(.$a)$values))
##code to create the data frame
a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d", "d", "d", "e", "f", "f", "e", "e")
b <- c(rep("experiment", times = 8), rep("control", times = 8))
c <- c(rep("A01", times = 4), rep("A02", times = 4), rep("A03", times = 4), rep("A04", times = 4))
data <- data.frame(c,b,a)
## desired output
c      b         from  to    fromCount toCount
<chr> <chr>     <int>   <int>
1 A01 experimental  a     b             1       3
2 A02 experimental  a     c             1       1
3 A02 experimental  c     a             1       1
4 A02 experimental  a     b             1       1
5 A03 control       d     e             3       1
6 A04 control       f     e             2       2

与此处的较早帖子相比,由于我们对a列应用了分组,因此信息被压缩了。

我们可以使用data.table中的rleid

library(data.table)
library(dplyr)
data %>% 
group_by(b, c, grp = rleid(a)) %>%
summarise(from = first(a), fromCount = n()) %>% 
mutate(to = lead(from), toCount = lead(fromCount)) %>%
ungroup %>%
select(-grp) %>% 
filter(!is.na(to)) %>%
arrange(c)
# A tibble: 6 x 6
#  b          c     from  fromCount to    toCount
#  <chr>      <chr> <chr>     <int> <chr>   <int>
#1 experiment A01   a             1 b           3
#2 experiment A02   a             1 c           1
#3 experiment A02   c             1 a           1
#4 experiment A02   a             1 b           1
#5 control    A03   d             3 e           1
#6 control    A04   f             2 e           2

或者使用rle,按"b"、"c"分组后,summariserle创建一个list列,然后从summarise中的列中提取"值"和"长度",在"from"、"fromCount"列的lead上创建"to"、"toCount"filterNA元素并根据"c"列arrange

data %>% 
group_by(b, c) %>%
summarise(rl = list(rle(a)), 
from = rl[[1]]$values, 
fromCount = rl[[1]]$lengths) %>% 
mutate(to = lead(from), 
toCount = lead(fromCount)) %>%
ungroup %>% 
select(-rl) %>% 
filter(!is.na(to)) %>% 
arrange(c)
# A tibble: 6 x 6
#  b          c     from  fromCount to    toCount
#  <chr>      <chr> <chr>     <int> <chr>   <int>
#1 experiment A01   a             1 b           3
#2 experiment A02   a             1 c           1
#3 experiment A02   c             1 a           1
#4 experiment A02   a             1 b           1
#5 control    A03   d             3 e           1
#6 control    A04   f             2 e           2

我们还可以使用map遍历rlelist列('rl'(,提取组件,并获取lengthsleadvaluestibble,使用unnest_wider创建列并unnestlist结构,filter出NA元素并arrange

library(tidyr)
library(purrr)
data %>% 
group_by(b, c) %>%
summarise(rl = list(rle(a))) %>%
ungroup %>%
mutate(out = map(rl, 
~ tibble(from = .x$values,
fromCount = .x$lengths,
to = lead(from), 
toCount = lead(fromCount)))) %>%
unnest_wider(c(out)) %>% 
unnest(from:toCount) %>%
filter(!is.na(to)) %>% 
arrange(c) %>% 
select(-rl)

或者在tidyverse中,创建一个函数,为单个主题的跟踪执行rle

rleSlice <- function(Tracking) {
rlTrack <- rle(as.character(Tracking))  # Strip the levels from the factor, they interfere
tibble(from = rlTrack$values, to = lead(rlTrack$values),
fromCount = rlTrack$lengths, toCount = lead(rlTrack$lengths)) %>% 
filter(!is.na(to)) %>% 
list()
}

确保它的行为正常

[[1]]
rleSlice(c("a", "b", "b", "b", "c"))
A tibble: 2 x 4
from  to    fromCount toCount
<chr> <chr>     <int>   <int>
1 a     b             1       3
2 b     c             3       1

现在,我们将分组并获取每个参与者的rle

data %>% 
as_tibble() %>% 
# This is easier to track than all these a,b,c's
rename(Subject = c, Test = b, Tracking = a) %>% 
group_by(Subject, Test) %>% 
summarise(Slice = rleSlice(Tracking)) %>% 
unnest(col = "Slice") %>% 
ungroup()
# A tibble: 6 x 6
Subject Test       from  to    fromCount toCount
<fct>   <fct>      <chr> <chr>     <int>   <int>
1 A01     experiment a     b             1       3
2 A02     experiment a     c             1       1
3 A02     experiment c     a             1       1
4 A02     experiment a     b             1       1
5 A03     control    d     e             3       1
6 A04     control    f     e             2       2

最新更新