我正试图添加一列到df中,该df保存一对在多行上重复的字符串的计数。计数需要根据另一列的更改重置。
更具体地说:我正试图将试验数添加到一个非常大的数据框架中。每次试验由2部分组成(show后跟point), show和point各与一个值相关联,每次试验可以有任意数量的show/point值。每个ID可以有不同数量的试验,但每次试验总是有一个显示,然后是一个点。这意味着每个ID将有不同的行数。
样本数据:
ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
TrialType <- c("Show", "Show", "Show", "Point", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point", "Show", "Show", "Show", "Show", "Point", "Show", "Show", "Show", "Point", "Point", "Point")
Value <- c(0.52, 0.54, 0.55, 0.57, 0.58, 0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.59,0.75,0.89,0.32,0.99,0.01,0.02,0.56,0.67,0.32,0.55, 0.57, 0.58, 0.59,0.75,0.89, 0.99)
df<-as.data.frame(c(ID, TrialType, Value))
TrialNumber<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
df.desired <- cbind(ID, TrialType, Value, TrialNumber)
我想我需要一个遍历ID的循环,但这对我来说太高级了。我是R和stackoverflow的新手。提前感谢您的帮助。
使用
每个id
,检查当前值是否为Point
,上一个值是否为Show
。如果是这样,开始一个新的计数。
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(TrialNumber = TrialType == 'Show' &
lag(TrialType, default = 'Point') == 'Point',
TrialNumber = cumsum(TrialNumber))
ID TrialType Value TrialNumber
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3
您可以从data.table
中使用rleid
:
library(dplyr)
library(data.table)
df %>%
mutate(tmp = data.table::rleid(TrialType),
tmp = ifelse(TrialType == "Point", tmp - 1, tmp)) %>%
group_by(ID) %>%
mutate(TrialNumber = data.table::rleid(tmp)) %>%
select(-tmp) %>%
ungroup()
给了:
ID TrialType Value TrialNumber
<dbl> <chr> <dbl> <int>
1 1 Show 0.52 1
2 1 Show 0.54 1
3 1 Show 0.55 1
4 1 Point 0.57 1
5 1 Point 0.58 1
6 1 Point 0.59 1
7 1 Point 0.75 1
8 1 Show 0.89 2
9 1 Show 0.32 2
10 1 Show 0.99 2
11 1 Show 0.01 2
12 1 Point 0.02 2
13 1 Show 0.56 3
14 1 Show 0.67 3
15 1 Point 0.32 3
16 2 Show 0.59 1
17 2 Show 0.75 1
18 2 Show 0.89 1
19 2 Point 0.32 1
20 2 Point 0.99 1
21 2 Point 0.01 1
22 2 Show 0.02 2
23 2 Show 0.56 2
24 2 Show 0.67 2
25 2 Show 0.32 2
26 2 Point 0.55 2
27 2 Show 0.57 3
28 2 Show 0.58 3
29 2 Show 0.59 3
30 2 Point 0.75 3
31 2 Point 0.89 3
32 2 Point 0.99 3