r-在顺序数据中创建独特的组,以随着时间的推移重复



以前曾问过这种事情,但我无法找到这种方式。

有关创建顺序ID的线程,并带有几个其他链接

以顺序创建标识符并不难,但是我的数据包含一个使我循环的时间元素。以下数据是一个虚构的数据集,只是为了说明问题的问题:

    dput(walking_dat)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street = structure(c(4L, 3L, 3L, 
5L, 3L, 4L, 6L, 7L, 4L, 4L, 1L, 2L, 1L), .Label = c("12thAve", 
"14thAve", "Dupont", "Hennepin", "Lyndale", "Marquette", "Nicolette"
), class = "factor"), sequence = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 
5, 1, 2, 3), visit = c(1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 1, 2)), .Names = c("neighborhood", 
"street", "sequence", "visit"), row.names = c(NA, -13L), class = "data.frame")
   neighborhood    street sequence visit
1        Uptown  Hennepin        1     1
2        Uptown    Dupont        2     1
3        Uptown    Dupont        3     1
4        Uptown   Lyndale        4     1
5        Uptown    Dupont        5     2
6      Downtown  Hennepin        1     1
7      Downtown Marquette        2     1
8      Downtown Nicolette        3     1
9      Downtown  Hennepin        4     2
10     Downtown  Hennepin        5     2
11    Dinkytown   12thAve        1     1
12    Dinkytown   14thAve        2     1
13    Dinkytown   12thAve        3     2

为了想象,所有数据均来自明尼阿波利斯三个社区的三个人。每行代表记录其位置的时间。第一列是他们正在穿越的社区。第二列是它们位于每个时间点的交点。第三列是这些数据发生的序列。

我想创建visit列,该列记录在同一街,同一街区,一次访问的同一街的顺序时间,然后作为下一次访问的返回访问。如何创建这种顺序标识符?


我认为使用FUN=seq_along Trick的ave()可能会起作用,但是我找不到一种结合使我想成为的因素的方法。

在每个数据框中创建一个排顺序数(计数器)[重复]


更新:UWE的解决方案有效,但是如果某人决定留在一个十字路口以进行所有测量,这就是我试图将其放入真实数据时发生的情况。如果发生这种情况,则原始行数不会返回最终数据。看看这里发生了什么:

dput(walking_dat_2)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street2 = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve", 
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1, 
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit_2 = c(1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 1, 1, 1)), .Names = c("neighborhood", "street2", 
"sequence", "visit_2"), row.names = c(NA, -13L), class = "data.frame")
   neighborhood   street2 sequence visit_2
1        Uptown  Hennepin        1       1
2        Uptown  Hennepin        2       1
3        Uptown  Hennepin        3       1
4        Uptown  Hennepin        4       1
5        Uptown  Hennepin        5       1
6      Downtown  Hennepin        1       1
7      Downtown Marquette        2       1
8      Downtown Nicolette        3       1
9      Downtown  Hennepin        4       2
10     Downtown  Hennepin        5       2
11    Dinkytown   12thAve        1       1
12    Dinkytown   12thAve        2       1
13    Dinkytown   12thAve        3       1

在这种情况下,运行UWE的解决方案仅返回6行。

library(data.table)
setDT(walking_dat)[, visit_2 := rleid(neighborhood, street2)][
     , unique(.SD, by = "visit_2")][
         , visit_2 := rowid(neighborhood, street2)][
             walking_dat, on = .(neighborhood, street2, sequence), roll = TRUE, visit_2 := x.visit_2][]
   neighborhood   street2 sequence visit visit_2
1:       Uptown  Hennepin        1     1       1
2:     Downtown  Hennepin        1     2       1
3:     Downtown Marquette        2     3       1
4:     Downtown Nicolette        3     4       1
5:     Downtown  Hennepin        4     5       2
6:    Dinkytown   12thAve        1     6       1

这里的困难是,随后到同一社区的同一条街道的记录应计算为一次访问。这需要将这些行分为一行,计算到不同社区的访问&街道并最终将其扩展到原始的行数。

请注意,包含预期结果的列visit不是不是覆盖,但要与计算的visit_new列进行比较。

library(data.table)
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         1
 3:       Uptown    Dupont        3     1         1
 4:       Uptown   Lyndale        4     1         1
 5:       Uptown    Dupont        5     2         2
 6:     Downtown  Hennepin        1     1         1
 7:     Downtown Marquette        2     1         1
 8:     Downtown Nicolette        3     1         1
 9:     Downtown  Hennepin        4     2         2
10:     Downtown  Hennepin        5     2         2
11:    Dinkytown   12thAve        1     1         1
12:    Dinkytown   14thAve        2     1         1
13:    Dinkytown   12thAve        3     2         2

逐步说明

DF被胁迫到data.table。rleid()函数创建了唯一的数字,以更改邻里&街。

 setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         2
 3:       Uptown    Dupont        3     1         2
 4:       Uptown   Lyndale        4     1         3
 5:       Uptown    Dupont        5     2         4
 6:     Downtown  Hennepin        1     1         5
 7:     Downtown Marquette        2     1         6
 8:     Downtown Nicolette        3     1         7
 9:     Downtown  Hennepin        4     2         8
10:     Downtown  Hennepin        5     2         8
11:    Dinkytown   12thAve        1     1         9
12:    Dinkytown   14thAve        2     1        10
13:    Dinkytown   12thAve        3     2        11

注意行2&重复3,以及第9行&10.在下一步中删除了重复项,该步骤会创建一个新的临时数据。

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         2
 3:       Uptown   Lyndale        4     1         3
 4:       Uptown    Dupont        5     2         4
 5:     Downtown  Hennepin        1     1         5
 6:     Downtown Marquette        2     1         6
 7:     Downtown Nicolette        3     1         7
 8:     Downtown  Hennepin        4     2         8
 9:    Dinkytown   12thAve        1     1         9
10:    Dinkytown   14thAve        2     1        10
11:    Dinkytown   12thAve        3     2        11

现在,我们可以使用rowid()功能将访问访问到不同的社区和街道:

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         1
 3:       Uptown   Lyndale        4     1         1
 4:       Uptown    Dupont        5     2         2
 5:     Downtown  Hennepin        1     1         1
 6:     Downtown Marquette        2     1         1
 7:     Downtown Nicolette        3     1         1
 8:     Downtown  Hennepin        4     2         2
 9:    Dinkytown   12thAve        1     1         1
10:    Dinkytown   14thAve        2     1         1
11:    Dinkytown   12thAve        3     2         2

最后,我们需要再次将结果扩展到原始的行数。这是通过临时数据的滚动JOIN 来完成的。用原始DF(包括所有行都):

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]

也许值得注意的是,使用visit_new并重复使用以通过各个阶段保存临时数据,直到最终更新为止。

新数据集

固定代码也适用于OP提供的第二个数据集:

walking_dat_2 <-
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve", 
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1, 
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit = c(1, 1, 1, 1, 1, 
1, 1, 1, 2, 2, 1, 1, 1), visit_new = c(1L, 1L, 1L, 1L, 1L, 2L, 
3L, 4L, 5L, 5L, 6L, 6L, 6L)), .Names = c("neighborhood", "street", 
"sequence", "visit", "visit_new"), row.names = c(NA, -13L), class = "data.frame")
setDT(walking_dat_2)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat_2, on = .(neighborhood, street, sequence), 
      roll = TRUE, .SD]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown  Hennepin        2     1         1
 3:       Uptown  Hennepin        3     1         1
 4:       Uptown  Hennepin        4     1         1
 5:       Uptown  Hennepin        5     1         1
 6:     Downtown  Hennepin        1     1         1
 7:     Downtown Marquette        2     1         1
 8:     Downtown Nicolette        3     1         1
 9:     Downtown  Hennepin        4     2         2
10:     Downtown  Hennepin        5     2         2
11:    Dinkytown   12thAve        1     1         1
12:    Dinkytown   12thAve        2     1         1
13:    Dinkytown   12thAve        3     1         1
# Not required, but convenient:
walking_dat$combo <- paste(walking_dat$neighborhood, walking_dat$street)
# Place holder:
walking_dat$visit <- NA
# Create it:
for(i in 1:nrow(walking_dat)){
  if(i %in% row.names(walking_dat[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0)), ])){
    walking_dat$visit[i] <- sum(walking_dat$combo[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0))][1:i]==walking_dat$combo[i], na.rm=T)
  } else{
    walking_dat$visit[i] <- 1
  }
}
walking_dat
   neighborhood    street sequence visit              combo
1        Uptown  Hennepin        1     1    Uptown Hennepin
2        Uptown    Dupont        2     1      Uptown Dupont
3        Uptown    Dupont        3     1      Uptown Dupont
4        Uptown   Lyndale        4     1     Uptown Lyndale
5        Uptown    Dupont        5     2      Uptown Dupont
6      Downtown  Hennepin        1     1  Downtown Hennepin
7      Downtown Marquette        2     1 Downtown Marquette
8      Downtown Nicolette        3     1 Downtown Nicolette
9      Downtown  Hennepin        4     2  Downtown Hennepin
10     Downtown  Hennepin        5     1  Downtown Hennepin
11    Dinkytown   12thAve        1     2  Dinkytown 12thAve
12    Dinkytown   14thAve        2     1  Dinkytown 14thAve
13    Dinkytown   12thAve        3     2  Dinkytown 12thAve

相关内容

  • 没有找到相关文章

最新更新