根据";time_ passed";在公司中,分为几类(0到5岁的人,6到10岁的人和11到15岁的人等:每次4岁(。我想在没有for循环的情况下可以做到这一点,但我希望能够同时使用for循环和split(或子集,或任何其他R函数(函数。
以下是我的数据集结构:
structure(list(sex = c("F", "H", "F", "F", "H", "F"), age = c("24",
"33", "53", "32", "38", "21"), time_passed = c("0", "3", "4",
"0", "2", "0"), level = c("N7 ", "N7 ", "N9 ", "N7 ", "N8 ",
" "), wage = c("2605", "4931", "11123", "3750", "6180", "858.31"
)), row.names = c(NA, 6L), class = "data.frame")
还有我尝试过的for循环,但没有成功:
list_tranches <- c()
for (i in seq(from = 5, to = 40, by=5)) {
for (j in 1:nrow(data_2021)){
if(data_2021[j,4] %in% seq(i-5+1:i))
tranche_i <- data_2021[j,]
list_tranches <- c(list_tranches, tranche_i)
}
}
最终,我想要一个变量";部分";添加到我的数据集df中,指示每个人在公司中度过的时间类别(0到5年、6到10年等(。我该如何继续?
显然,在没有循环的情况下这样做会更快。以下一行代码与您试图实现的内容相同:
split(data_2021, data_2021$time_passed %/% 5)
但是,如果您想使用for循环来完成此操作,那么您的代码会出现一些问题。首先,如果你试图比较数字,你需要确保你的列是数字。您的dput
显示time_passed
列是一个字符列,因此您需要从开始
data_2021$time_passed <- as.numeric(data_2021$time_passed)
其次,应该将list_tranches
定义为list
,而不是向量。
list_tranches <- list()
你的循环中有几个问题。首先,您根本不需要嵌套循环,因为索引在R中是矢量化的。其次,time_passed
是数据帧中的第三列,但您要在第四列中查找值。第三,您的seq
语法错误。它将始终生成一个从1开始的序列。
把这些放在一起,我们有:
for (i in seq(from = 5, to = 40, by = 5)) {
j <- which(data_2021$time_passed %in% (i - 5:1))
if(length(j) > 0) list_tranches[[i/5]] <- data_2021[j,]
}
list_tranches
#> [[1]]
#> sex age time_passed level wage
#> 1 F 24 0 N7 2605
#> 2 H 33 3 N7 4931
#> 3 F 53 4 N9 11123
#> 4 F 32 0 N7 3750
#> 5 H 38 2 N8 6180
#> 6 F 21 0 858.31
当然,这里的例子并不好,因为所有的值都在同一部分。
创建于2022-08-04由reprex包(v2.0.1(
您要查找findInterval
还是cut
后面跟着split
?
data_2021 <-
structure(list(
sex = c("F", "H", "F", "F", "H", "F"),
age = c("24", "33", "53", "32", "38", "21"),
time_passed = c("0", "3", "4", "0", "2", "0"),
level = c("N7 ", "N7 ", "N9 ", "N7 ", "N8 ", " "),
wage = c("2605", "4931", "11123", "3750", "6180", "858.31")),
row.names = c(NA, 6L),
class = "data.frame")
data_2021$time_passed <- as.integer(data_2021$time_passed)
breaks <- seq(0, 49, by = 5)
ff <- findInterval(data_2021$time_passed, breaks)
split(data_2021, ff)
#> $`1`
#> sex age time_passed level wage
#> 1 F 24 0 N7 2605
#> 2 H 33 3 N7 4931
#> 3 F 53 4 N9 11123
#> 4 F 32 0 N7 3750
#> 5 H 38 2 N8 6180
#> 6 F 21 0 858.31
cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
cc <- droplevels(cc)
split(data_2021, cc)
#> $`[0,5]`
#> sex age time_passed level wage
#> 1 F 24 0 N7 2605
#> 2 H 33 3 N7 4931
#> 3 F 53 4 N9 11123
#> 4 F 32 0 N7 3750
#> 5 H 38 2 N8 6180
#> 6 F 21 0 858.31
创建于2022-08-04由reprex包(v2.0.1(
若要添加新列tranche
,请使用cut/split
和结果的names属性。
cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
cc <- droplevels(cc)
sp <- split(data_2021, cc)
res <- lapply(seq_along(sp), (i){
sp[[i]]$tranche <- names(sp)[i]
sp[[i]]
})
rm(sp)
res <- do.call(rbind, res)
res
#> sex age time_passed level wage tranche
#> 1 F 24 0 N7 2605 [0,5]
#> 2 H 33 3 N7 4931 [0,5]
#> 3 F 53 4 N9 11123 [0,5]
#> 4 F 32 0 N7 3750 [0,5]
#> 5 H 38 2 N8 6180 [0,5]
#> 6 F 21 0 858.31 [0,5]
创建于2022-08-04由reprex包(v2.0.1(