我正在尝试在不产生NA
值的情况下,使用lag()
函数作为条件来变异数据帧中的列。让我创建一个示例:
df <- data.frame("Score" = as.numeric(c("20", "10", "15", "30", "15", "10")),
"Time" = c("1", "2", "1", "2", "1", "2"),
"Team" = c("A", "A", "B", "B", "C", "C"))
之后,我创建了一个名为Diff
的新列,用于计算每个团队的得分差异:
df <- df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score))
我的问题是,这种方法创建了NA
值,很明显:
Score Time Team Diff
20 1 A NA
10 2 A -10
15 1 B NA
30 2 B 15
15 1 C NA
10 2 C -5
我的目标是在最后做到这一点:
Score Time Team Diff
20 1 A -10
10 2 A -10
15 1 B 15
30 2 B 15
15 1 C -5
10 2 C -5
我再次尝试使用case_when()
函数来替换NA
作为下一个值,但它也不起作用:
df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score)) %>%
mutate(Diff = case_when(
NA ~ lead(Diff)
))
无论如何,我该如何将NA
值替换为下一个Diff
值
非常感谢!
只需在事实之后使用fill()
:
library(tidyverse)
df <- data.frame("Score" = as.numeric(c("20", "10", "15", "30", "15", "10")),
"Time" = c("1", "2", "1", "2", "1", "2"),
"Team" = c("A", "A", "B", "B", "C", "C"))
df <- df %>%
group_by(Team) %>%
mutate(Diff = Score - lag(Score)) %>%
fill(Diff, .direction = 'up')
df
# output
# Score Time Team Diff
# <dbl> <chr> <chr> <dbl>
#1 20 1 A -10
#2 10 2 A -10
#3 15 1 B 15
#4 30 2 B 15
#5 15 1 C -5
#6 10 2 C -5