我的数据由对不同组的一组观察结果组成。每组有不同数量的观察结果。我想创建一个变量,该变量用";1〃;以获得进一步的手动QA/QC。旗帜应在一组内有规律地间隔,但两组之间的间隔可能不同。间距是通过将每组的长度除以一个常数(本例为5(得出的。
数据看起来像这样:
dt<-data.table(places=c(rep("A",10), rep("B",20))) #the data
dt2<-data.table(places=c("A","B"), spacing=c(2,4)) #the spacings by group to apply to the data
然后应用一些代码生成标记(或序列(
dt$sequence<- ????
看起来像:
places sequence
A 1
A
A 1
A
...
B 1
B
B
B
从本质上讲,我想让每个小组";"计数";基于已经为该组确定的理想间距;1〃;每次计数回收时。我只是不知道如何输入数据。表中的间距和组组合。
这里有另一个选项:
dt[, sq := dt2[.SD, on=.(places), +((rowid(i.places)-1) %% spacing == 0L)]]
输出:
places sq
1: A 1
2: A 0
3: A 1
4: A 0
5: A 1
6: A 0
7: A 1
8: A 0
9: A 1
10: A 0
11: B 1
12: B 0
13: B 0
14: B 0
15: B 1
16: B 0
17: B 0
18: B 0
19: B 1
20: B 0
21: B 0
22: B 0
23: B 1
24: B 0
25: B 0
26: B 0
27: B 1
28: B 0
29: B 0
30: B 0
您可以馈送数据。使用连接dt2[.SD, on=.(places)
计算该间距和组组合,然后使用rowid
生成序列,然后取模以找到seq整数可被间距整除的行。
我得到了数据表解决方案:
dtest[, sequence := rep(seq_len(floor(.N/5)),length.out=.N), by = places]
dtest[sequence!=1,sequence:=NA]
以前从未使用过长度。。。。
根据我们的对话,以下是dplyr
解决方案,每个解决方案都以开头
library(data.table)
library(dplyr)
dt <- data.table(places=c(rep("A",10), rep("B",20))) #the data
对于所讨论的两种方法:
- 普遍除数(此处为
5
(:
# The divisor to be applied universally across all groups.
universal_divisor <- 5
# The vectorized function you specified.
f <- function(group_length, divisor){
return(floor(group_length / divisor))
}
dt_universal <- dt %>%
# Group in order to index each row WITHIN its group.
group_by(places) %>%
# Mark a 1 at each point calculated by the given function 'f' from the group
# group size, against the universal divisor; otherwise make blank (NA).
mutate(sequence = if_else(row_number() %% f(n(), universal_divisor) == 0,
1, as.numeric(NA))) %>%
ungroup() %>% as.data.table()
- 自定义间距:
# Your spacings by group to apply to the data.
dt2 <- data.table(places=c("A","B"), spacing=c(2,4))
dt_custom <- dt %>%
# Match each row to the custom spacing value for its 'place'.
left_join(dt2, by = "places") %>%
# Group in order to index each row WITHIN its group.
group_by(places) %>%
# Mark with a 1 at the desired spacing; otherwise make blank (NA).
transmute(places,
sequence = if_else(row_number() %% spacing == 0,
1, as.numeric(NA))) %>%
ungroup() %>% as.data.table()
每种方法都将输出下面的data.table
。虽然使用data.table
可以更有效地完成其中一些操作,但我个人发现dplyr
的工作流程非常透明和灵活。
places sequence
1: A NA
2: A 1
3: A NA
4: A 1
5: A NA
6: A 1
7: A NA
8: A 1
9: A NA
10: A 1
11: B NA
12: B NA
13: B NA
14: B 1
15: B NA
16: B NA
17: B NA
18: B 1
19: B NA
20: B NA
21: B NA
22: B 1
23: B NA
24: B NA
25: B NA
26: B 1
27: B NA
28: B NA
29: B NA
30: B 1
places sequence