我想把数据帧df1转换成数据帧df2。
id <- c(1,2,3)
outcome_1 <- c(1,0,1)
outcome_2 <- c(1,1,0)
df1 <- data.frame(id,outcome_1,outcome_2)
id <- c(1,2,3)
outcome <- c("1,2","2","1")
df2 <- data.frame(id,outcome)
以下问题的答案几乎是我想要的,但在我的情况下,一行可以有多个积极的结果(例如,第一行需要是";1,2")。同时,我想结果列是字符列。
R:将多个二进制列转换为一个因子变量,其因子为二进制列名
请帮忙。谢谢你。
用二进制值强制as.logical
的结果的substring
s子集
apply(df1[-1], 1, (x) toString(substring(names(df1)[-1], 9)[as.logical(x)]))
# [1] "1, 2" "2" "1"
或
apply(df1[-1], 1, (x) paste(substring(names(df1)[-1], 9)[as.logical(x)], collapse=','))
# [1] "1,2" "2" "1"
使用第一个方法:
cbind(df1[1], outcome=apply(df1[-1], 1, (x) toString(substring(names(df1)[-1], 9)[as.logical(x)])))
# id outcome
# 1 1 1, 2
# 2 2 2
# 3 3 1
如果你想要一个嵌套列表你可以使用list2DF
l <- list2DF(c(df1[1],
outcome=list(apply(df1[-1], 1, (x)
as.numeric(substring(names(df1)[-1], 9))[as.logical(x)]))))
l
# id outcome
# 1 1 1, 2
# 2 2 2
# 3 3 1
,
str(l)
# 'data.frame': 3 obs. of 2 variables:
# $ id : num 1 2 3
# $ outcome:List of 3
# ..$ : num 1 2
# ..$ : num 2
# ..$ : num 1
数据:
df1 <- structure(list(id = c(1, 2, 3), outcome_1 = c(1, 0, 1), outcome_2 = c(1,
1, 0)), class = "data.frame", row.names = c(NA, -3L))
这里还有一个tidyverse
方法:
library(dplyr)
library(tidyr)
df1 %>%
mutate(across(-id, ~case_when(. == 1 ~ cur_column()), .names = 'new_{col}'), .keep="unused") %>%
unite(outcome, starts_with('new'), na.rm = TRUE, sep = ', ') %>%
mutate(outcome = gsub('outcome_', '', outcome))
id outcome
1 1 1, 2
2 2 2
3 3 1
有多少outcome_
列?如果只有2,这将工作得很好。
library(dplyr)
df1 %>%
rowwise() %>%
summarise(id = id,
outcome = paste(which(c(outcome_1,outcome_2)==1), collapse =","))
# A tibble: 3 x 2
id outcome
<dbl> <chr>
1 1 1,2
2 2 2
3 3 1
如果有多于2个,试试这个:
df1 %>%
rowwise() %>%
summarise(id=id,
outcome = paste(which(c_across(-id)== 1), collapse =","))
另一个可能的解决方案,基于dplyr
和purrr::pmap
:
library(tidyverse)
df1 %>%
transmute(id, outcome = pmap(., ~ c(1*..2, 2*..3) %>% .[. != 0] %>% toString))
#> id outcome
#> 1 1 1, 2
#> 2 2 2
#> 3 3 1
或简单:
library(tidyverse)
pmap_dfr(df1, ~ data.frame(id = ..1, outcome = c(1*..2, 2*..3) %>% .[. != 0]
%>% toString))
#> id outcome
#> 1 1 1, 2
#> 2 2 2
#> 3 3 1
outcome_col_idx <- grepl("outcome", colnames(df1))
cbind(
df1[,!outcome_col_idx, drop = FALSE],
outcome = apply(
replace(df1, df1 == 0, NA)[,outcome_col_idx],
1,
function(x){
as.factor(
toString(
gsub(
"outcome_",
"",
names(x)[complete.cases(x)]
)
)
)
}
)
)