如何将变量分组,然后按R中的行相减



我正在尝试对变量group、type和year进行分组。每个组、类型和年份都有一个特定的代码,每年都会发生变化。我想创建一个名为";差异";其中,如果组和类型在一年中具有200的代码,并且在下一年具有210的代码;差异";列将其记录为10的增量。

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
df <- data.frame(group, type, year, code)

这就是df的样子:

group   type year code
1      A  small 1995  100
2      A medium 1995  100
3      A  large 1995  100
4      B  small 1995  200
5      B medium 1995  200
6      B  large 1995  200
7      C  small 1995  300
8      C medium 1995  300
9      C  large 1995  300
10     A  small 1996  150
11     A medium 1996  150
12     A  large 1996  100
13     B  small 1996  200
14     B medium 1996  200
15     B  large 1996  200
16     C  small 1996  350
17     C medium 1996  320
18     C  large 1996  300
19     A  small 1997  130
20     A medium 1997  170
21     A  large 1997   90
22     B  small 1997  210
23     B medium 1997   90
24     B  large 1997   80
25     C  small 1997  310
26     C medium 1997  300
27     C  large 1997  320

我想要以下输出:

group <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
type <- c("small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large", "small", "medium", "large")
year <- c(1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,
1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997)
code <- c(100, 100, 100, 200, 200, 200, 300, 300, 300,
150, 150, 100, 200, 200, 200, 350, 320, 300,
130, 170, 90, 210, 90, 80, 310, 300, 320)
difference <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
50, 50, 0, 0, 0, 0, 50, 20, 0,
-20, 20, -10, 10, 110, 120, -40, -20, 0)
df2 <- data.frame(group, type, year, code, difference)
group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90        110
24     B  large 1997   80        120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320          0

这就是我尝试的:

df3 <- df2 %>%
group_by(group, type, year) %>%
mutate(difference = code - lag(code))

问题是,滞后似乎没有考虑分组,而是从它前面的行中减去。有什么建议吗?

OP请求更新:要获得0,我们可以使用ifelse语句:

df %>% 
group_by(group, type) %>% 
mutate(difference= ifelse(is.na(lag(code)), 0, code - lag(code))) %>% 
data.frame()
group   type year code difference
1      A  small 1995  100          0
2      A medium 1995  100          0
3      A  large 1995  100          0
4      B  small 1995  200          0
5      B medium 1995  200          0
6      B  large 1995  200          0
7      C  small 1995  300          0
8      C medium 1995  300          0
9      C  large 1995  300          0
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90       -110
24     B  large 1997   80       -120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320         20

第一(答案(:正如@IRTFM已经指出的那样。仅用CCD_ 3和CCD_。它给出了几乎相同的输出。请注意,最后一行不同。

library(dplyr)
df %>% 
group_by(group, type) %>% 
mutate(difference= code - lag(code)) %>% 
data.frame()
group   type year code difference
1      A  small 1995  100         NA
2      A medium 1995  100         NA
3      A  large 1995  100         NA
4      B  small 1995  200         NA
5      B medium 1995  200         NA
6      B  large 1995  200         NA
7      C  small 1995  300         NA
8      C medium 1995  300         NA
9      C  large 1995  300         NA
10     A  small 1996  150         50
11     A medium 1996  150         50
12     A  large 1996  100          0
13     B  small 1996  200          0
14     B medium 1996  200          0
15     B  large 1996  200          0
16     C  small 1996  350         50
17     C medium 1996  320         20
18     C  large 1996  300          0
19     A  small 1997  130        -20
20     A medium 1997  170         20
21     A  large 1997   90        -10
22     B  small 1997  210         10
23     B medium 1997   90       -110
24     B  large 1997   80       -120
25     C  small 1997  310        -40
26     C medium 1997  300        -20
27     C  large 1997  320         20

您可以在ave中使用diff

dat[order(dat$group, dat$type), ] |>
transform(diff=ave(code, group, type, FUN=(x) c(NA, diff(x)))) |>
((x) x[order(as.numeric(rownames(x))), ])()  ## optional, to reorder rows
#    group   type year code diff
# 1      A  small 1995  100   NA
# 2      A medium 1995  100   NA
# 3      A  large 1995  100   NA
# 4      B  small 1995  200   NA
# 5      B medium 1995  200   NA
# 6      B  large 1995  200   NA
# 7      C  small 1995  300   NA
# 8      C medium 1995  300   NA
# 9      C  large 1995  300   NA
# 10     A  small 1996  150   50
# 11     A medium 1996  150   50
# 12     A  large 1996  100    0
# 13     B  small 1996  200    0
# 14     B medium 1996  200    0
# 15     B  large 1996  200    0
# 16     C  small 1996  350   50
# 17     C medium 1996  320   20
# 18     C  large 1996  300    0
# 19     A  small 1997  130  -20
# 20     A medium 1997  170   20
# 21     A  large 1997   90  -10
# 22     B  small 1997  210   10
# 23     B medium 1997   90 -110
# 24     B  large 1997   80 -120
# 25     C  small 1997  310  -40
# 26     C medium 1997  300  -20
# 27     C  large 1997  320   20

注意:R>=4.1使用过的

相关内容

  • 没有找到相关文章

最新更新