首先让我说,我已经为这个基本问题搜索了很多,但找到的答案似乎都不能解决这个问题。如果这个问题已经有了答案,请原谅。
我想计算我的数据中行为的发生次数。
mydata <- data.frame(BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"))
,输出必须像这样
myoutput <- data.frame(
BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"),
mycount=c(
1,2,3,3,3,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5))
我在dplyr包中尝试使用ave和n_distinct,但我只得到给定行为的计数,而不是累积计数。
如果有任何帮助或提示如何解决这个问题,我将不胜感激。
孙燕姿
对于group-by操作和cumsum
来说,这很容易。我喜欢使用package data.table.
library(data.table)
setDT(mydata)
mydata[, mycount := c(1, rep(0, .N - 1)), by = BH] #first occurences
mydata[, mycount := cumsum(mycount)]
all.equal(setDF(mydata), myoutput)
#[1] TRUE
这是tidyverse
的解决方案-不像Roland的解决方案那么简洁,但它有效。
library(tidyverse)
x <- mydata |>
mutate(rn = row_number())
x |>
group_by(BH) |>
mutate(id = cur_group_id()) |>
ungroup() |>
pivot_wider(names_from = BH,
values_from = id,
values_fill = 0) |>
mutate(across(
sniff:watch, ~ cumsum(.x) > 0, .names = "{.col}_temp"),
mycount = rowSums(across(ends_with('_temp')))
) |>
dplyr::select(c(rn:watch, mycount)) |>
right_join(x, by = 'rn') |>
pivot_longer(-c(rn, mycount, BH)) |>
filter(value !=0) |>
dplyr::select(BH, mycount)
#> # A tibble: 20 × 2
#> BH mycount
#> <chr> <dbl>
#> 1 sniff 1
#> 2 explore 2
#> 3 walking 3
#> 4 explore 3
#> 5 walking 3
#> 6 trotting 4
#> 7 sniff 4
#> 8 explore 4
#> 9 trotting 4
#> 10 trotting 4
#> 11 walking 4
#> 12 walking 4
#> 13 walking 4
#> 14 watch 5
#> 15 walking 5
#> 16 trotting 5
#> 17 watch 5
#> 18 walking 5
#> 19 walking 5
#> 20 walking 5