r语言 - 沿向量找到唯一值的计数器



首先让我说,我已经为这个基本问题搜索了很多,但找到的答案似乎都不能解决这个问题。如果这个问题已经有了答案,请原谅。

我想计算我的数据中行为的发生次数。

mydata <- data.frame(BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"))

,输出必须像这样

myoutput <- data.frame(
BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"),
mycount=c(
1,2,3,3,3,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5))

我在dplyr包中尝试使用ave和n_distinct,但我只得到给定行为的计数,而不是累积计数。

如果有任何帮助或提示如何解决这个问题,我将不胜感激。

孙燕姿

对于group-by操作和cumsum来说,这很容易。我喜欢使用package data.table.

library(data.table)
setDT(mydata)
mydata[, mycount := c(1, rep(0, .N - 1)), by = BH] #first occurences
mydata[, mycount := cumsum(mycount)]
all.equal(setDF(mydata), myoutput)
#[1] TRUE

这是tidyverse的解决方案-不像Roland的解决方案那么简洁,但它有效。

library(tidyverse)
x <- mydata |> 
mutate(rn = row_number())

x |> 
group_by(BH) |> 
mutate(id = cur_group_id()) |> 
ungroup() |> 
pivot_wider(names_from = BH, 
values_from = id, 
values_fill = 0) |> 
mutate(across(
sniff:watch, ~ cumsum(.x) > 0, .names = "{.col}_temp"),
mycount = rowSums(across(ends_with('_temp')))
) |>
dplyr::select(c(rn:watch, mycount)) |> 
right_join(x, by = 'rn') |> 
pivot_longer(-c(rn, mycount, BH)) |> 
filter(value !=0) |> 
dplyr::select(BH, mycount)
#> # A tibble: 20 × 2
#>    BH       mycount
#>    <chr>      <dbl>
#>  1 sniff          1
#>  2 explore        2
#>  3 walking        3
#>  4 explore        3
#>  5 walking        3
#>  6 trotting       4
#>  7 sniff          4
#>  8 explore        4
#>  9 trotting       4
#> 10 trotting       4
#> 11 walking        4
#> 12 walking        4
#> 13 walking        4
#> 14 watch          5
#> 15 walking        5
#> 16 trotting       5
#> 17 watch          5
#> 18 walking        5
#> 19 walking        5
#> 20 walking        5

最新更新