R:对一个列中的字符串对进行计数,并根据不同的列重置计数



我正试图添加一列到df中,该df保存一对在多行上重复的字符串的计数。计数需要根据另一列的更改重置。

更具体地说:我正试图将试验数添加到一个非常大的数据框架中。每次试验由2部分组成(show后跟point), show和point各与一个值相关联,每次试验可以有任意数量的show/point值。每个ID可以有不同数量的试验,但每次试验总是有一个显示,然后是一个点。这意味着每个ID将有不同的行数。

样本数据:

ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
TrialType <- c("Show", "Show", "Show", "Point", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point")
Value <- c(0.52, 0.54, 0.55, 0.57, 0.58, 0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.55, 0.57, 0.58, 0.59,0.75,0.89, 0.99)
df<-as.data.frame(c(ID, TrialType, Value))
TrialNumber<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) 
df.desired <- cbind(ID, TrialType, Value, TrialNumber)

我想我需要一个遍历ID的循环,但这对我来说太高级了。我是R和stackoverflow的新手。提前感谢您的帮助。

使用

每个id,检查当前值是否为Point,上一个值是否为Show。如果是这样,开始一个新的计数。

library(tidyverse)
df %>%
group_by(ID) %>%
mutate(TrialNumber = TrialType == 'Show' &
lag(TrialType, default = 'Point') == 'Point',
TrialNumber = cumsum(TrialNumber))
ID TrialType Value TrialNumber
1   1      Show  0.52           1
2   1      Show  0.54           1
3   1      Show  0.55           1
4   1     Point  0.57           1
5   1     Point  0.58           1
6   1     Point  0.59           1
7   1     Point  0.75           1
8   1      Show  0.89           2
9   1      Show  0.32           2
10  1      Show  0.99           2
11  1      Show  0.01           2
12  1     Point  0.02           2
13  1      Show  0.56           3
14  1      Show  0.67           3
15  1     Point  0.32           3
16  2      Show  0.59           1
17  2      Show  0.75           1
18  2      Show  0.89           1
19  2     Point  0.32           1
20  2     Point  0.99           1
21  2     Point  0.01           1
22  2      Show  0.02           2
23  2      Show  0.56           2
24  2      Show  0.67           2
25  2      Show  0.32           2
26  2     Point  0.55           2
27  2      Show  0.57           3
28  2      Show  0.58           3
29  2      Show  0.59           3
30  2     Point  0.75           3
31  2     Point  0.89           3
32  2     Point  0.99           3

您可以从data.table中使用rleid:

library(dplyr)
library(data.table)
df %>% 
mutate(tmp = data.table::rleid(TrialType),
tmp = ifelse(TrialType == "Point", tmp - 1, tmp)) %>% 
group_by(ID) %>% 
mutate(TrialNumber = data.table::rleid(tmp)) %>% 
select(-tmp) %>%
ungroup()

给了:

ID TrialType Value TrialNumber
<dbl> <chr>     <dbl>       <int>
1     1 Show       0.52           1
2     1 Show       0.54           1
3     1 Show       0.55           1
4     1 Point      0.57           1
5     1 Point      0.58           1
6     1 Point      0.59           1
7     1 Point      0.75           1
8     1 Show       0.89           2
9     1 Show       0.32           2
10     1 Show       0.99           2
11     1 Show       0.01           2
12     1 Point      0.02           2
13     1 Show       0.56           3
14     1 Show       0.67           3
15     1 Point      0.32           3
16     2 Show       0.59           1
17     2 Show       0.75           1
18     2 Show       0.89           1
19     2 Point      0.32           1
20     2 Point      0.99           1
21     2 Point      0.01           1
22     2 Show       0.02           2
23     2 Show       0.56           2
24     2 Show       0.67           2
25     2 Show       0.32           2
26     2 Point      0.55           2
27     2 Show       0.57           3
28     2 Show       0.58           3
29     2 Show       0.59           3
30     2 Point      0.75           3
31     2 Point      0.89           3
32     2 Point      0.99           3

相关内容

  • 没有找到相关文章

最新更新