尝试添加计算列,其中计算的每一行都基于R中不断变化的数据集

  • 本文关键字:计算 一行 数据集 变化 添加 r
  • 更新时间 :
  • 英文 :


我很难弄清楚如何在R中执行以下操作。想象一下以下数据集:

pdata <- tibble(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)

基本上,我想做的是添加一个计算列,它将看到行中的ID,并在给定时间在该行之前的情况下对该ID的值求和。例如,在第3行中,它会看到id 1有两个比2018年旧的记录,所以它将它们相加为2。因此对于第3行,新计算的列将具有值2。下面是我需要的例子。

# A tibble: 50 × 4
id  time value OUTPUT
<int> <int> <dbl>  <dbl>
1     1  2016     1      0
2     1  2017     1      1
3     1  2018     1      2
4     1  2019     0      3
5     1  2020     0      3
6     2  2016     1      0
7     2  2017     1      1
8     2  2018     0      2
9     2  2019     0      2
10     2  2020     0      2
# … with 40 more rows

谢谢!

idtime排序,按id分组,并计算滞后值的累积和。

suppressPackageStartupMessages(library(dplyr))
pdata <- tibble(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)
pdata %>%
arrange(id, time) %>%
group_by(id) %>%
mutate(OUTPUT = cumsum(lag(value, default = 0))) %>%
ungroup()
#> # A tibble: 50 × 4
#>       id  time value OUTPUT
#>    <int> <int> <dbl>  <dbl>
#>  1     1  2016     1      0
#>  2     1  2017     1      1
#>  3     1  2018     1      2
#>  4     1  2019     0      3
#>  5     1  2020     0      3
#>  6     2  2016     1      0
#>  7     2  2017     1      1
#>  8     2  2018     0      2
#>  9     2  2019     0      2
#> 10     2  2020     0      2
#> # … with 40 more rows

创建于2022-09-11由reprex包(v2.0.1(

library(tidyverse)
df <- data.frame(
id = rep(1:10, each = 5),
time = rep(2016:2020, times = 10),
value = c(c(1,1,1,0,0), c(1,1,0,0,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,0,0,0,1), c(0,1,1,1,0), c(0,1,1,1,1), c(1,1,1,1,1), c(1,0,1,1,1), c(1,1,0,1,1))
)
df1 <- df %>% 
group_by(id) %>% 
mutate(output = case_when(time < 2018 ~ 1,
TRUE ~ 0)) %>% 
mutate(output = cumsum(lag(output, default = 0)))

控制台:

id  time value output
<int> <int> <dbl>  <dbl>
1     1  2016     1      0
2     1  2017     1      1
3     1  2018     1      2
4     1  2019     0      2
5     1  2020     0      2
6     2  2016     1      0
7     2  2017     1      1
8     2  2018     0      2
9     2  2019     0      2
10     2  2020     0      2

最新更新