R——循环遍历数据帧并引用下一个唯一值

我有一个基本看起来像这样的数据框架:

df<-data.frame(yearseason = c("1999 1", "1999 1", "1999 1", "1999 3", "1999 3", "1999 3", "2000 1", "2000 1", "2000 1") , 
species = c("a", "b", "c", "a", "b", "c", "a", "b", "c"), 
count = c(1, 6, 3, 7, 2, 9, 4, 5, 7))

我想添加一个"next_yearseason"列，并为每行填充下一个唯一的yearseason值。也就是1999年的3"对于第1-3行，"2000"4-6行，等等

是否有一种简单的方法来编写for循环来做到这一点?

我试过了:

for (i in unique(df$yearseason)){
(unique(df$next_yearseason))[i]<-(unique(df$yearseason))[i+1]
}

…但这没有工作，我得到了一个错误:错误在I + 1:非数字参数二进制操作符

我有一个解决方法来获得没有循环的结果，我只是想知道循环是否可以做到这一点。

使用dplyr，您可以执行以下操作:

library(dplyr)
inner_join(
df, df %>% distinct(yearseason) %>% mutate(next_yearseason = lead(yearseason))
)

输出:

yearseason species count next_yearseason
1     1999 1       a     1          1999 3
2     1999 1       b     6          1999 3
3     1999 1       c     3          1999 3
4     1999 3       a     7          2000 1
5     1999 3       b     2          2000 1
6     1999 3       c     9          2000 1
7     2000 1       a     4            <NA>
8     2000 1       b     5            <NA>
9     2000 1       c     7            <NA>

你可以在循环中这样做:

ys = unique(df$yearseason) 
for(i in 1:(length(ys)-1)) {
df[df$yearseason==ys[i], "next_yearseason"] <- ys[i+1] 
}

输出:

yearseason species count next_yearseason
1     1999 1       a     1          1999 3
2     1999 1       b     6          1999 3
3     1999 1       c     3          1999 3
4     1999 3       a     7          2000 1
5     1999 3       b     2          2000 1
6     1999 3       c     9          2000 1
7     2000 1       a     4            <NA>
8     2000 1       b     5            <NA>
9     2000 1       c     7            <NA>

基本的R方法当然不会使用循环，而是使用尾部-3项并填充NA(或组成下一个)。tail删除n先导项，其中n为第二个参数:

df$next_yrseas <- c( tail(df$yearseason, -3), rep(NA, 3))

或

df$next_yrseas <- c( tail(df$yearseason, -3), rep("2000 3", 3))
> df
yearseason species count next_yrseas
1     1999 1       a     1      1999 3
2     1999 1       b     6      1999 3
3     1999 1       c     3      1999 3
4     1999 3       a     7      2000 1
5     1999 3       b     2      2000 1
6     1999 3       c     9      2000 1
7     2000 1       a     4      2000 3
8     2000 1       b     5      2000 3
9     2000 1       c     7      2000 3

我认为尾巴的tidyverse对应可能是lag或lead，并认为它们被设计为自动填充。(在R的基础上有一个延迟，它的行为不相似，似乎主要是为了迷惑新手，至少我在早期感到困惑。)

相关内容

最新更新

热门标签：