我有一些整洁的数据,其中一个是空白的:
df <- data.frame(Group = c(rep(LETTERS[1:3], 3), "Blank", "Blank", "Blank"),
ID = rep(1:3, 4),
Value = c(10, 11, 12, 21, 22, 23, 31, 32, 33, 1, 2, 3))
df
Group ID Value
1 A 1 10
2 B 2 11
3 C 3 12
4 A 1 21
5 B 2 22
6 C 3 23
7 A 1 31
8 B 2 32
9 C 3 33
10 Blank 1 1
11 Blank 2 2
12 Blank 3 3
我想从每组(A、B、C(中减去Blank
,所以规范化的数据将如下所示:
df_normalized<- data.frame(Group = rep(LETTERS[1:3], 3),
ID = rep(1:3, 3),
Value = c(9, 9, 9, 20, 20, 20, 30, 30, 30))
df_normalized
Group ID Value
1 A 1 9
2 B 2 9
3 C 3 9
4 A 1 20
5 B 2 20
6 C 3 20
7 A 1 30
8 B 2 30
9 C 3 30
如何使用dplyr很好地做到这一点?
编辑: 如何为多个组执行此操作?例如:
df <- data.frame(Cluster = c(rep("C1", 12), rep("C2", 12)),
Group = rep(c(rep(LETTERS[1:3], 3), "Blank", "Blank", "Blank"), 2),
ID = rep(1:3, 8),
Value = sample(24))
假设每个ID
只有一个"空白"值,如示例中所示,您可以执行
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Value = Value - Value[Group == "Blank"]) %>%
filter(Group != "Blank")
# Group ID Value
# <fct> <int> <dbl>
#1 A 1 9
#2 B 2 9
#3 C 3 9
#4 A 1 20
#5 B 2 20
#6 C 3 20
#7 A 1 30
#8 B 2 30
#9 C 3 30
如果您有多个"空白",则可以使用match
来确保仅选择第一个值。
df %>%
group_by(ID) %>%
mutate(Value = Value - Value[match("Blank", Group)]) %>%
filter(Group != "Blank")