以前曾问过这种事情,但我无法找到这种方式。
有关创建顺序ID的线程,并带有几个其他链接
以顺序创建标识符并不难,但是我的数据包含一个使我循环的时间元素。以下数据是一个虚构的数据集,只是为了说明问题的问题:
dput(walking_dat)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown",
"Uptown"), class = "factor"), street = structure(c(4L, 3L, 3L,
5L, 3L, 4L, 6L, 7L, 4L, 4L, 1L, 2L, 1L), .Label = c("12thAve",
"14thAve", "Dupont", "Hennepin", "Lyndale", "Marquette", "Nicolette"
), class = "factor"), sequence = c(1, 2, 3, 4, 5, 1, 2, 3, 4,
5, 1, 2, 3), visit = c(1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 1, 2)), .Names = c("neighborhood",
"street", "sequence", "visit"), row.names = c(NA, -13L), class = "data.frame")
neighborhood street sequence visit
1 Uptown Hennepin 1 1
2 Uptown Dupont 2 1
3 Uptown Dupont 3 1
4 Uptown Lyndale 4 1
5 Uptown Dupont 5 2
6 Downtown Hennepin 1 1
7 Downtown Marquette 2 1
8 Downtown Nicolette 3 1
9 Downtown Hennepin 4 2
10 Downtown Hennepin 5 2
11 Dinkytown 12thAve 1 1
12 Dinkytown 14thAve 2 1
13 Dinkytown 12thAve 3 2
为了想象,所有数据均来自明尼阿波利斯三个社区的三个人。每行代表记录其位置的时间。第一列是他们正在穿越的社区。第二列是它们位于每个时间点的交点。第三列是这些数据发生的序列。
我想创建visit
列,该列记录在同一街,同一街区,一次访问的同一街的顺序时间,然后作为下一次访问的返回访问。如何创建这种顺序标识符?
我认为使用FUN=seq_along
Trick的ave()
可能会起作用,但是我找不到一种结合使我想成为的因素的方法。
在每个数据框中创建一个排顺序数(计数器)[重复]
更新:UWE的解决方案有效,但是如果某人决定留在一个十字路口以进行所有测量,这就是我试图将其放入真实数据时发生的情况。如果发生这种情况,则原始行数不会返回最终数据。看看这里发生了什么:
dput(walking_dat_2)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown",
"Uptown"), class = "factor"), street2 = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve",
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1,
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit_2 = c(1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 1, 1, 1)), .Names = c("neighborhood", "street2",
"sequence", "visit_2"), row.names = c(NA, -13L), class = "data.frame")
neighborhood street2 sequence visit_2
1 Uptown Hennepin 1 1
2 Uptown Hennepin 2 1
3 Uptown Hennepin 3 1
4 Uptown Hennepin 4 1
5 Uptown Hennepin 5 1
6 Downtown Hennepin 1 1
7 Downtown Marquette 2 1
8 Downtown Nicolette 3 1
9 Downtown Hennepin 4 2
10 Downtown Hennepin 5 2
11 Dinkytown 12thAve 1 1
12 Dinkytown 12thAve 2 1
13 Dinkytown 12thAve 3 1
在这种情况下,运行UWE的解决方案仅返回6行。
library(data.table)
setDT(walking_dat)[, visit_2 := rleid(neighborhood, street2)][
, unique(.SD, by = "visit_2")][
, visit_2 := rowid(neighborhood, street2)][
walking_dat, on = .(neighborhood, street2, sequence), roll = TRUE, visit_2 := x.visit_2][]
neighborhood street2 sequence visit visit_2
1: Uptown Hennepin 1 1 1
2: Downtown Hennepin 1 2 1
3: Downtown Marquette 2 3 1
4: Downtown Nicolette 3 4 1
5: Downtown Hennepin 4 5 2
6: Dinkytown 12thAve 1 6 1
这里的困难是,随后到同一社区的同一条街道的记录应计算为一次访问。这需要将这些行分为一行,计算到不同社区的访问&街道并最终将其扩展到原始的行数。
请注意,包含预期结果的列visit
不是不是覆盖,但要与计算的visit_new
列进行比较。
library(data.table)
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
, unique(.SD, by = "visit_new")][
, visit_new := rowid(neighborhood, street)][
walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]
neighborhood street sequence visit visit_new 1: Uptown Hennepin 1 1 1 2: Uptown Dupont 2 1 1 3: Uptown Dupont 3 1 1 4: Uptown Lyndale 4 1 1 5: Uptown Dupont 5 2 2 6: Downtown Hennepin 1 1 1 7: Downtown Marquette 2 1 1 8: Downtown Nicolette 3 1 1 9: Downtown Hennepin 4 2 2 10: Downtown Hennepin 5 2 2 11: Dinkytown 12thAve 1 1 1 12: Dinkytown 14thAve 2 1 1 13: Dinkytown 12thAve 3 2 2
逐步说明
DF
被胁迫到data.table。rleid()
函数创建了唯一的数字,以更改邻里&街。
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][]
neighborhood street sequence visit visit_new 1: Uptown Hennepin 1 1 1 2: Uptown Dupont 2 1 2 3: Uptown Dupont 3 1 2 4: Uptown Lyndale 4 1 3 5: Uptown Dupont 5 2 4 6: Downtown Hennepin 1 1 5 7: Downtown Marquette 2 1 6 8: Downtown Nicolette 3 1 7 9: Downtown Hennepin 4 2 8 10: Downtown Hennepin 5 2 8 11: Dinkytown 12thAve 1 1 9 12: Dinkytown 14thAve 2 1 10 13: Dinkytown 12thAve 3 2 11
注意行2&重复3,以及第9行&10.在下一步中删除了重复项,该步骤会创建一个新的临时数据。
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
, unique(.SD, by = "visit_new")][]
neighborhood street sequence visit visit_new 1: Uptown Hennepin 1 1 1 2: Uptown Dupont 2 1 2 3: Uptown Lyndale 4 1 3 4: Uptown Dupont 5 2 4 5: Downtown Hennepin 1 1 5 6: Downtown Marquette 2 1 6 7: Downtown Nicolette 3 1 7 8: Downtown Hennepin 4 2 8 9: Dinkytown 12thAve 1 1 9 10: Dinkytown 14thAve 2 1 10 11: Dinkytown 12thAve 3 2 11
现在,我们可以使用rowid()
功能将访问访问到不同的社区和街道:
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
, unique(.SD, by = "visit_new")][
, visit_new := rowid(neighborhood, street)][]
neighborhood street sequence visit visit_new 1: Uptown Hennepin 1 1 1 2: Uptown Dupont 2 1 1 3: Uptown Lyndale 4 1 1 4: Uptown Dupont 5 2 2 5: Downtown Hennepin 1 1 1 6: Downtown Marquette 2 1 1 7: Downtown Nicolette 3 1 1 8: Downtown Hennepin 4 2 2 9: Dinkytown 12thAve 1 1 1 10: Dinkytown 14thAve 2 1 1 11: Dinkytown 12thAve 3 2 2
最后,我们需要再次将结果扩展到原始的行数。这是通过临时数据的滚动JOIN 来完成的。用原始DF
(包括所有行都):
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
, unique(.SD, by = "visit_new")][
, visit_new := rowid(neighborhood, street)][
walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]
也许值得注意的是,使用visit_new
并重复使用以通过各个阶段保存临时数据,直到最终更新为止。
新数据集
固定代码也适用于OP提供的第二个数据集:
walking_dat_2 <-
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown",
"Uptown"), class = "factor"), street = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve",
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1,
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit = c(1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 1, 1, 1), visit_new = c(1L, 1L, 1L, 1L, 1L, 2L,
3L, 4L, 5L, 5L, 6L, 6L, 6L)), .Names = c("neighborhood", "street",
"sequence", "visit", "visit_new"), row.names = c(NA, -13L), class = "data.frame")
setDT(walking_dat_2)[, visit_new := rleid(neighborhood, street)][
, unique(.SD, by = "visit_new")][
, visit_new := rowid(neighborhood, street)][
walking_dat_2, on = .(neighborhood, street, sequence),
roll = TRUE, .SD]
neighborhood street sequence visit visit_new 1: Uptown Hennepin 1 1 1 2: Uptown Hennepin 2 1 1 3: Uptown Hennepin 3 1 1 4: Uptown Hennepin 4 1 1 5: Uptown Hennepin 5 1 1 6: Downtown Hennepin 1 1 1 7: Downtown Marquette 2 1 1 8: Downtown Nicolette 3 1 1 9: Downtown Hennepin 4 2 2 10: Downtown Hennepin 5 2 2 11: Dinkytown 12thAve 1 1 1 12: Dinkytown 12thAve 2 1 1 13: Dinkytown 12thAve 3 1 1
# Not required, but convenient:
walking_dat$combo <- paste(walking_dat$neighborhood, walking_dat$street)
# Place holder:
walking_dat$visit <- NA
# Create it:
for(i in 1:nrow(walking_dat)){
if(i %in% row.names(walking_dat[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0)), ])){
walking_dat$visit[i] <- sum(walking_dat$combo[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0))][1:i]==walking_dat$combo[i], na.rm=T)
} else{
walking_dat$visit[i] <- 1
}
}
walking_dat
neighborhood street sequence visit combo 1 Uptown Hennepin 1 1 Uptown Hennepin 2 Uptown Dupont 2 1 Uptown Dupont 3 Uptown Dupont 3 1 Uptown Dupont 4 Uptown Lyndale 4 1 Uptown Lyndale 5 Uptown Dupont 5 2 Uptown Dupont 6 Downtown Hennepin 1 1 Downtown Hennepin 7 Downtown Marquette 2 1 Downtown Marquette 8 Downtown Nicolette 3 1 Downtown Nicolette 9 Downtown Hennepin 4 2 Downtown Hennepin 10 Downtown Hennepin 5 1 Downtown Hennepin 11 Dinkytown 12thAve 1 2 Dinkytown 12thAve 12 Dinkytown 14thAve 2 1 Dinkytown 14thAve 13 Dinkytown 12thAve 3 2 Dinkytown 12thAve