r-数据框中的组内的时间(gen:0,-1,-2等)向后倒数(gen:0,-1,-2等)

  • 本文关键字:gen 数据 倒数 时间 r dataframe
  • 更新时间 :
  • 英文 :


我正在使用包含不同组的数据框架工作,并且所有这些都有多年的范围。这样的东西:

df <- data.frame(group = c(rep("aaa", 3), rep("bbb", 3), rep("ccc", 3)), year = c(2016:2018))
df  
   group  year  
1  aaa    2016  
2  aaa    2017
3  aaa    2018
4  bbb    2016
5  bbb    2017
6  bbb    2018
7  ccc    2016
8  ccc    2017
9  ccc    2018  

我想做的是创建一个基于年度分配值的列(生成(,其中最新一代是第0代,并且对老一辈的倒数倒数。这样:

   group  year  generation
1  aaa    2018  0
2  bbb    2018  0
3  ccc    2018  0
4  aaa    2017  -1
5  bbb    2017  -1
6  ccc    2017  -1 
7  aaa    2016  -2
8  bbb    2016  -2
9  ccc    2016  -2

我认为它必须类似以下内容,但是这给了我1到3的范围,而不是-2至0:

df2 <- df %>% 
  group_by(group) %>% 
  arrange(desc(year)) %>% 
  mutate(generation = min_rank(year))
df2
   group  year  generation
1  aaa    2018  3
2  bbb    2018  3
3  ccc    2018  3
4  aaa    2017  2
5  bbb    2017  2
6  ccc    2017  2 
7  aaa    2016  1
8  bbb    2016  1
9  ccc    2016  1

有什么想法如何获得我的所需范围?谢谢!

如果year并不总是连续的,我们可以order year并从组中的行总数中减去。

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(generation = -(n() - order(year))) %>%
  arrange(desc(year))
# group  year generation
#  <fct> <int>      <int>
#1 aaa    2018          0
#2 bbb    2018          0
#3 ccc    2018          0
#4 aaa    2017         -1
#5 bbb    2017         -1
#6 ccc    2017         -1
#7 aaa    2016         -2
#8 bbb    2016         -2
#9 ccc    2016         -2

使用基本r为

with(df, ave(year, group, FUN = function(x) -(length(x) - order(x))))

如果year始终是连续的,我们可以从组中的max年中减去year

df %>%
  group_by(group) %>%
  mutate(generation = year - max(year))

with(df, year - ave(year, group, FUN = max))

使用transform

transform(df[order(-df$year), ], 
          generation=factor(year, labels=-(2:0)))
#   group year generation
# 3   aaa 2018          0
# 6   bbb 2018          0
# 9   ccc 2018          0
# 2   aaa 2017         -1
# 5   bbb 2017         -1
# 8   ccc 2017         -1
# 1   aaa 2016         -2
# 4   bbb 2016         -2
# 7   ccc 2016         -2

如果数据有些不同,例如组bbb2017

失败
df2 <- df[-5, ]

我们可以将ave粘在其中以获得正确的一代计数。

transform(df2[order(-df2$year), ],
          generation=factor(
            with(df2, ave(as.numeric(group), year, FUN=seq)), 
            labels=-(0:2)))
#   group year generation
# 3   aaa 2018          0
# 6   bbb 2018          0
# 9   ccc 2018          0
# 2   aaa 2017         -1
# 8   ccc 2017         -1
# 1   aaa 2016         -2
# 4   bbb 2016         -1
# 7   ccc 2016         -2

数据

df <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("aaa", "bbb", "ccc"), class = "factor"), 
    year = c(2016L, 2017L, 2018L, 2016L, 2017L, 2018L, 2016L, 
    2017L, 2018L)), class = "data.frame", row.names = c(NA, -9L
))

data.table

的选项
library(data.table)
setDT(df)[, generation := year - max(year), group][order(- year)]
#    group year generation
#1:   aaa 2018          0
#2:   bbb 2018          0
#3:   ccc 2018          0
#4:   aaa 2017         -1
#5:   bbb 2017         -1
#6:   ccc 2017         -1
#7:   aaa 2016         -2
#8:   bbb 2016         -2
39:   ccc 2016         -2

最新更新