我有一个像这个代码创建的数据集:
data <- data.frame(ID=c(rep("01",3),rep("02",3)), x=c("abc","abc","cde","abc","abc","abc"), t1=c(1,6,9,1,6,11), t2=c(4,7,12,3,10,12))
和另一个由:
创建的data2 <- data.frame(ID=c(rep("01",12),rep("02",12)), t = rep(1:12,2), x= c(rep(NA,24)))
,我想要得到的是:
data_final <- data.frame(ID=c(rep("01",12),rep("02",12)), t = rep(1:12,2), x= c(rep("abc",4),NA,"abc","abc",NA,rep("cde",4),rep("abc",3),rep(NA,2),rep("abc",7)))
这段代码只适用于一个ID,我不知道如何处理更多ID:
for (i in 1:nrow(data)) {
data2$x[data$t1[i]:data$t2[i]] <- data$x[i]
}
任何想法?
下面是我如何使用tidyverse
元包中的tidyr
/dplyr
。
library(tidyverse)
data_expanded <- data %>%
uncount(t2 - t1 + 1, .id = "row") %>%
mutate(t = t1 + row - 1) %>% # convert to t
select(ID, t, x) # only need columns ID, t, and x
data2 %>%
select(-x) %>% # we'll add x from data_expanded next
left_join(data_expanded)
首先,将data
转换为明确列出与其相关的每个ID
和t
的表。我使用tidyr::uncount
来复制t1
到t2
范围内每个t
的每一行。
一旦这个表准备好了,它就是一个到原始文件的简单连接。
结果:
Joining, by = c("ID", "t")
ID t x
1 01 1 abc
2 01 2 abc
3 01 3 abc
4 01 4 abc
5 01 5 <NA>
6 01 6 abc
7 01 7 abc
8 01 8 <NA>
9 01 9 cde
10 01 10 cde
11 01 11 cde
12 01 12 cde
13 02 1 abc
14 02 2 abc
15 02 3 abc
16 02 4 <NA>
17 02 5 <NA>
18 02 6 abc
19 02 7 abc
20 02 8 abc
21 02 9 abc
22 02 10 abc
23 02 11 abc
24 02 12 abc
如果您想在基本R中使用loop
,下面的代码将满足您的要求。
# loop each row of data2
for (i in 1: dim(data2)[1]){
IDi <- data2[i, 1] # id of ith row
IDi_idx <-(IDi == data$ID) # use this IDi to find the index (same ID) in data
seldf_i <- data[IDi_idx, ] # subset data using IDi_idx
# if data2$x is NA, then check if data2[i, 2] is in the range [t1:t2] by each row of seldf_i; yes assigns according x, no with NA
for (j in 1: dim(seldf_i)[1]){
if (is.na(data2[i, 3])){
data2[i, 3] <- ifelse(data2[i, 2] %in% (seldf_i[j, 3]:seldf_i[j, 4]),
as.character(seldf_i[j, 2]), NA)}
}
}
data2
data.table
选项
data_final <- setDT(data)[
,
.(t = seq(t1, t2)), .(ID, x, seq(nrow(data)))
][
setDT(data2),
on = .(ID, t)
][
,
.(ID, t, x)
][]
为
> data_final
ID t x
1: 01 1 abc
2: 01 2 abc
3: 01 3 abc
4: 01 4 abc
5: 01 5 <NA>
6: 01 6 abc
7: 01 7 abc
8: 01 8 <NA>
9: 01 9 cde
10: 01 10 cde
11: 01 11 cde
12: 01 12 cde
13: 02 1 abc
14: 02 2 abc
15: 02 3 abc
16: 02 4 <NA>
17: 02 5 <NA>
18: 02 6 abc
19: 02 7 abc
20: 02 8 abc
21: 02 9 abc
22: 02 10 abc
23: 02 11 abc
24: 02 12 abc
ID t x