如何将多个二进制列转换为单个字符列?

  • 本文关键字:转换 字符 单个 二进制 r
  • 更新时间 :
  • 英文 :


我想把数据帧df1转换成数据帧df2。

id <- c(1,2,3)
outcome_1 <- c(1,0,1)
outcome_2 <- c(1,1,0)
df1 <- data.frame(id,outcome_1,outcome_2) 
id <- c(1,2,3)
outcome <- c("1,2","2","1")
df2 <- data.frame(id,outcome) 

以下问题的答案几乎是我想要的,但在我的情况下,一行可以有多个积极的结果(例如,第一行需要是";1,2")。同时,我想结果列是字符列。

R:将多个二进制列转换为一个因子变量,其因子为二进制列名

请帮忙。谢谢你。

用二进制值强制as.logical的结果的substrings子集

apply(df1[-1], 1, (x) toString(substring(names(df1)[-1], 9)[as.logical(x)]))
# [1] "1, 2" "2"    "1" 

apply(df1[-1], 1, (x) paste(substring(names(df1)[-1], 9)[as.logical(x)], collapse=','))
# [1] "1,2" "2"   "1"  

使用第一个方法:

cbind(df1[1], outcome=apply(df1[-1], 1, (x) toString(substring(names(df1)[-1], 9)[as.logical(x)])))
#   id outcome
# 1  1    1, 2
# 2  2       2
# 3  3       1

如果你想要一个嵌套列表你可以使用list2DF

l <- list2DF(c(df1[1],
outcome=list(apply(df1[-1], 1, (x) 
as.numeric(substring(names(df1)[-1], 9))[as.logical(x)]))))
l
#   id outcome
# 1  1    1, 2
# 2  2       2
# 3  3       1

,

str(l)
# 'data.frame': 3 obs. of  2 variables:
#   $ id     : num  1 2 3
# $ outcome:List of 3
# ..$ : num  1 2
# ..$ : num 2
# ..$ : num 1

数据:

df1 <- structure(list(id = c(1, 2, 3), outcome_1 = c(1, 0, 1), outcome_2 = c(1, 
1, 0)), class = "data.frame", row.names = c(NA, -3L))

这里还有一个tidyverse方法:

library(dplyr)
library(tidyr)
df1 %>% 
mutate(across(-id, ~case_when(. == 1 ~ cur_column()), .names = 'new_{col}'), .keep="unused") %>% 
unite(outcome, starts_with('new'), na.rm = TRUE, sep = ', ') %>% 
mutate(outcome = gsub('outcome_', '', outcome))
id outcome
1  1    1, 2
2  2       2
3  3       1

有多少outcome_列?如果只有2,这将工作得很好。

library(dplyr) 
df1 %>% 
rowwise() %>% 
summarise(id = id, 
outcome = paste(which(c(outcome_1,outcome_2)==1), collapse =",")) 
# A tibble: 3 x 2
id outcome
<dbl> <chr>  
1     1 1,2    
2     2 2      
3     3 1

如果有多于2个,试试这个:

df1 %>% 
rowwise() %>% 
summarise(id=id, 
outcome = paste(which(c_across(-id)== 1), collapse =",")) 

另一个可能的解决方案,基于dplyrpurrr::pmap:

library(tidyverse)
df1 %>%
transmute(id, outcome = pmap(., ~ c(1*..2, 2*..3) %>% .[. != 0] %>% toString))
#>   id outcome
#> 1  1    1, 2
#> 2  2       2
#> 3  3       1

或简单:

library(tidyverse)
pmap_dfr(df1, ~ data.frame(id = ..1, outcome = c(1*..2, 2*..3) %>% .[. != 0]
%>% toString))
#>   id outcome
#> 1  1    1, 2
#> 2  2       2
#> 3  3       1
outcome_col_idx <- grepl("outcome", colnames(df1))
cbind(
df1[,!outcome_col_idx, drop = FALSE],
outcome = apply(
replace(df1, df1 == 0, NA)[,outcome_col_idx],
1,
function(x){
as.factor(
toString(
gsub(
"outcome_", 
"", 
names(x)[complete.cases(x)]
)
)
)
}
)
)

最新更新