r语言 - 如何添加或操作以修复用于创建新变量的代码



我想修改以下代码:

library(dplyr)
df1<-df7 %>% 
group_by(SAMPN,PERNO) %>% 
mutate(loop = lag(cumsum(TPURP== "(2) All other home activities" ), default = 1))

我想在 (2( 所有其他家庭活动或 (1( 在家工作(有偿(或 (24( 循环旅行时循环更改。 只是 (24( 循环行程有点不同,它有自己的索引,因此在每一行中,TPURP 为 (24( 循环的循环行程索引 循环更改并在下一行再次更改

dput(Nontest[2893:2913,1:4])
structure(list(SAMPN = c(1626, 1626, 1626, 1626, 1626, 1626, 
1626, 1626, 1639, 1639, 1639, 1639, 1639, 1639, 1639, 1640, 1640, 
1640, 1643, 1643, 1643), PERNO = c(1, 1, 2, 2, 2, 3, 3, 4, 1, 
1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1), PLANO = c(5, 6, 3, 4, 5, 
2, 3, 2, 2, 3, 4, 2, 3, 4, 5, 2, 3, 4, 2, 3, 4), TPURP = structure(c(22L, 
9L, 22L, 13L, 1L, 5L, 19L, 5L, 3L, 2L, 22L, 22L, 3L, 13L, 2L, 
20L, 15L, 2L, 13L, 13L, 2L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor")), row.names = c(9914L, 9915L, 9916L, 9917L, 
9918L, 9919L, 9920L, 9922L, 9974L, 9975L, 9976L, 9977L, 9978L, 
9979L, 9980L, 9981L, 9982L, 9983L, 9992L, 9993L, 9994L), class = "data.frame")

输出

SAMPN PERNO PLANO                           TPURP        loop
9914  1626     1     5                  (24) Loop trip         1
9915  1626     1     6         (9) Picked up passenger         2
9916  1626     2     3                  (24) Loop trip         1
9917  1626     2     4           (13) Routine Shopping         2
9918  1626     2     5   (1) Working at home (for pay)         2
9919  1626     3     2             (5) Attending class         1
9920  1626     3     3 (19) Civic/Religious activities         1
9922  1626     4     2             (5) Attending class         1
9974  1639     1     2                    (3) Work/Job         1
9975  1639     1     3   (2) All other home activities         1
9976  1639     1     4                  (24) Loop trip         2
9977  1639     2     2                  (24) Loop trip         1
9978  1639     2     3                    (3) Work/Job         2
9979  1639     2     4           (13) Routine Shopping         2
9980  1639     2     5   (2) All other home activities         2
9981  1640     1     2   (20) Recreation/Entertainment         1 
9982  1640     1     3          (15) Household errands         1
9983  1640     1     4   (2) All other home activities         1
9992  1643     1     2           (13) Routine Shopping         1
9993  1643     1     3           (13) Routine Shopping         1
9994  1643     1     4   (2) All other home activities         1

我只是把我想要的放出来。 如果您修复我的代码而不是新方法,那就太好了。 但如果无法修复,请提供其他方法

更多解释数据 (24( 环路行程

structure(list(SAMPN = c(1626, 1626, 1626, 1626, NA, 1626), PERNO = c(1, 
1, 1, 1, NA, 2), PLANO = c(4, 5, 6, 7, NA, 2), TPURP = structure(c(22L, 
22L, 9L, 2L, NA, 22L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor")), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -6L), groups = structure(list(
SAMPN = c(1626, 1626, NA), PERNO = c(1, 2, NA), .rows = list(
1:4, 6L, 5L)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

输出

SAMPN PERNO PLANO TPURP                             loop   
<dbl> <dbl> <dbl> <fct>                             
1  1626     1     4 (24) Loop trip                     1
2  1626     1     5 (24) Loop trip                     2
3  1626     1     6 (9) Picked up passenger            3
4  1626     1     7 (2) All other home activities      3
5    NA    NA    NA NA                                 NA
6  1626     2     2 (24) Loop trip                     1

每个环路行程都有自己的索引

我们创建一个向量(vals(包含所有要增加计数、group_bySAMPNPERNO的活动,并使用lagcummax为每个组创建一个loop计数。

vals <- c("(2) All other home activities", "(24) Loop trip", 
"(1) Working at home (for pay)")
library(dplyr)
df7 %>% 
group_by(SAMPN,PERNO) %>% 
mutate(loop = cummax(lag(1 + (TPURP %in% vals), default = 1)))

#   SAMPN PERNO PLANO TPURP                            loop
#   <dbl> <dbl> <dbl> <fct>                           <dbl>
# 1  1626     1     5 (24) Loop trip                      1
# 2  1626     1     6 (9) Picked up passenger             2
# 3  1626     2     3 (24) Loop trip                      1
# 4  1626     2     4 (13) Routine Shopping               2
# 5  1626     2     5 (1) Working at home (for pay)       2
# 6  1626     3     2 (5) Attending class                 1
# 7  1626     3     3 (19) Civic/Religious activities     1
# 8  1626     4     2 (5) Attending class                 1
# 9  1639     1     2 (3) Work/Job                        1
#10  1639     1     3 (2) All other home activities       1
# … with 11 more rows

最新更新