基于r中的多个标准在数据帧中创建一个新变量

我有一个数据集，它有

COl1 COl2 Col3   
1     0     0 
0     1     0
0     0     1 
1     0     0

基于这三列，我需要在同一个表中添加新的变量

预期输出

COl1 COl2 Col3  New_variable   
1     0     0     c1
0     1     0     c2
0     0     1     c3
1     0     0     c1

如果我们想根据每行中是否存在1来分配变量，我们可以使用max.col。

df$New_variable <- paste0('c', max.col(df))
df
#  COl1 COl2 Col3 New_variable
#1    1    0    0           c1
#2    0    1    0           c2
#3    0    0    1           c3
#4    1    0    0           c1

在一行中有多个1的情况下，检查?max.col中的各种ties.method。

如果我们需要为每一行分配唯一的ID，我们可以逐行粘贴值，然后使用match分配ID。

vals <- do.call(paste, c(df, sep = "-"))
df$New_variable <- paste0('c', match(vals, unique(vals)))

以下是一些基本的R解决方案：

df$New_variable <- paste0("c",seq(df)%*%t(df))

或

df$New_variable <- paste0("c",rowSums(df*col(df)))

或

df$New_variable <- paste0("c",which(t(df)==1,arr.ind = T)[,"row"])

使得

> df
COl1 COl2 Col3 New_variable
1    1    0    0           c1
2    0    1    0           c2
3    0    0    1           c3
4    1    0    0           c1

数据

df <- structure(list(COl1 = c(1L, 0L, 0L, 1L), COl2 = c(0L, 1L, 0L, 
0L), Col3 = c(0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

另一个base选项：

df$New_variable <- paste0('c', apply(df, 1, function(x) which(x != 0)))

输出：

COl1 COl2 Col3 New_variable
1    1    0    0           c1
2    0    1    0           c2
3    0    0    1           c3
4    1    0    0           c1

由于标签中有一个对dplyr的模糊引用，您也可以将其与purrr结合使用——尽管与各种可用的base解决方案相比，这显然是一种过度使用(从所有答案中可以明显看出(：

library(dplyr)
df %>%
mutate(
New_variable = purrr::pmap(select(., 1:3), ~ paste0('c', which(c(...) != 0)))
)

因此，在select(., 1:3)语句中，您可以选择要使用的列(这里我们使用所有3列，因此您可以只使用.而不是整个select，这将具有相同的效果(。

相关内容

最新更新

热门标签：