我在R中有一个数据集(n=500),看起来像这样
ID A C S
1 4 4 4
2 3 2 3
3 5 4 2
我想创建一个新变量(我将这个变量称为"same"),它告诉我是否有任何列具有相同的值(不包括我的Id列)。
ID A C S Same
1 4 4 4 all
2 3 2 3 as
3 5 4 2 none
4 7 7 2 ac
任何帮助将非常感激!我完全迷路了!谢谢你!
我们可以遍历具有选定列的apply
(MARGIN = 1
)的行([-1]
没有'ID'列),然后检查unique
元素的length
,if
为1,返回'all'或else
paste
duplicated
元素的names
。如果没有重复项,则返回空白""
,将空白更改为'none'
df1$Same <- apply(df1[-1], 1, (x) {
x1 <- if(length(unique(x)) == 1) 'all' else
paste(tolower(names(x))[duplicated(x)|duplicated(x,
fromLast = TRUE)], collapse = "")
x1[x1 == ""] <- "none"
x1})
与产出
> df1
ID A C S Same
1 1 4 4 4 all
2 2 3 2 3 as
3 3 5 4 2 none
4 4 7 7 2 ac
数据df1 <- structure(list(ID = 1:4, A = c(4L, 3L, 5L, 7L), C = c(4L, 2L,
4L, 7L), S = c(4L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
尝试使用dplyr
rowwise
和rle
df |> rowwise() |> mutate(Same = case_when(length(rle(sort(c_across(A:S)))$values) == 1 ~ "all" ,
length(rle(sort(c_across(A:S)))$values) == 3 ~ "none" ,
c_across(A) == c_across(C) ~ "ac" ,
c_across(C) == c_across(S) ~ "cs" , TRUE ~ "as"))
输出# A tibble: 4 × 5
# Rowwise:
ID A C S Same
<int> <int> <int> <int> <chr>
1 1 4 4 4 all
2 2 3 2 3 as
3 3 5 4 2 none
4 4 7 7 2 ac