r语言 - dplyr过滤器,然后在保留所有数据的同时进行变异



我有一个数据集,其中每个母亲和婴儿"dyad"都有一个id。我想创建一个新变量,它只使用来自婴儿变量的数据。通过使用dplyr::filter函数,这很简单。但是,使用过滤器意味着丢失母数据。是否有一种方法可以过滤,然后改变,同时仍然保留所有的数据?

的例子:

require(tidyverse)
dataSet <- data.frame(dyad_id = c(1,1,1,1,2,2,2,2,3,3,3,3),
dyad = c("Mom","Mom","Inf","Inf","Mom","Mom","Inf","Inf","Mom","Mom","Inf","Inf"),
timepoint = c(1,2,1,2,1,2,1,2,1,2,1,2),
v1 = c(3,4,5,2,4,6,3,67,8,4,3,2),
v2 = c(6,8,3,4,5,6,1,3,4,5,6,7))
dataSet <- dataSet %>% 
dplyr::filter(dyad == "Inf") %>% 
dplyr::mutate(v3 = v1 + v2)

当我运行这个时,它从数据集中删除了所有的母数据:

> dataSet
dyad_id dyad timepoint v1 v2 v3
1       1  Inf         1  5  3  8
2       1  Inf         2  2  4  6
3       2  Inf         1  3  1  4
4       2  Inf         2 67  3 70
5       3  Inf         1  3  6  9
6       3  Inf         2  2  7  9

所需输出:

dyad_id dyad timepoint v1 v2 v3
1        1  Mom         1  3  6 NA
2        1  Mom         2  4  8 NA
3        1  Inf         1  5  3 8
4        1  Inf         2  2  4 6
5        2  Mom         1  4  5 NA
6        2  Mom         2  6  6 NA
7        2  Inf         1  3  1 4
8        2  Inf         2 67  3 70
9        3  Mom         1  8  4 NA
10       3  Mom         2  4  5 NA
11       3  Inf         1  3  6 9
12       3  Inf         2  2  7 9

提前感谢!

看一下dplyr中的if_else函数:

require(tidyverse)
dataSet <- data.frame(dyad_id = c(1,1,1,1,2,2,2,2,3,3,3,3),
dyad = c("Mom","Mom","Inf","Inf","Mom","Mom","Inf","Inf","Mom","Mom","Inf","Inf"),
timepoint = c(1,2,1,2,1,2,1,2,1,2,1,2),
v1 = c(3,4,5,2,4,6,3,67,8,4,3,2),
v2 = c(6,8,3,4,5,6,1,3,4,5,6,7))
dataSet <- dataSet %>% 
dplyr::mutate(v3 = if_else(dyad == "Inf",v1 + v2,0))

> head(dataSet)
dyad_id dyad timepoint v1 v2 v3
1       1  Mom         1  3  6  0
2       1  Mom         2  4  8  0
3       1  Inf         1  5  3  8
4       1  Inf         2  2  4  6
5       2  Mom         1  4  5  0
6       2  Mom         2  6  6  0

我们可以使用默认情况下返回NAcase_when

library(dplyr)
dataSet %>%
mutate(v3 = case_when(dyad == 'Inf' ~ v1 + v2))

与产出

#    dyad_id dyad timepoint v1 v2 v3
#1        1  Mom         1  3  6 NA
#2        1  Mom         2  4  8 NA
#3        1  Inf         1  5  3  8
#4        1  Inf         2  2  4  6
#5        2  Mom         1  4  5 NA
#6        2  Mom         2  6  6 NA
#7        2  Inf         1  3  1  4
#8        2  Inf         2 67  3 70
#9        3  Mom         1  8  4 NA
#10       3  Mom         2  4  5 NA
#11       3  Inf         1  3  6  9
#12       3  Inf         2  2  7  9

我知道用{data来回答有些不太好。关于{tidyverse}的问题,但是我将把它留在这里,因为它在data.table

中非常方便地实现了
library(data.table)
dataSet <- as.data.table(dataSet)
dataSet[filter_column == 'filter_value', mutate_column := 'mutate_value']

不满足过滤条件的每一行将在mutate_column上分配一个NA(或者如果该列已经存在,它将保持这些值不变)

那么,在你的例子中:

dataSet[dyad == 'Inf', v3 := v1 + v2]

然后,您可以立即返回管道dplyr函数,您的数据集将再次成为一个标签,不留下数据的足迹。表操作。

最新更新