我有一个数据帧,其中的列如下:
Patient Gene
1 A
1 B
2 A
2 B
2 C
3 A
3 C
我想取消搜索,所以每个患者都是一行,每个基因都是一列,比如在中
GeneA GeneB GeneC
Patient1 1 1 0
Patient2 1 1 1
Patient3 1 0 1
使用pivot_wider
:
library(tidyr)
library(dplyr)
df %>%
mutate(value = 1) %>%
pivot_wider(names_from = Gene, values_fill = 0, names_prefix = "Gene")
输出
# A tibble: 3 × 4
Patient GeneA GeneB GeneC
<int> <dbl> <dbl> <dbl>
1 1 1 1 0
2 2 1 1 1
3 3 1 0 1
fastDummies::dummy_cols
:的另一个选项
library(fastDummies)
df %>%
dummy_cols("Gene", remove_selected_columns = TRUE) %>%
group_by(Patient) %>%
summarise(across(everything(), max))
library(data.table)
df <- data.frame(
stringsAsFactors = FALSE,
Patient = c(1L, 1L, 2L, 2L, 2L, 3L, 3L),
Gene = c("A", "B", "A", "B", "C", "A", "C")
)
df
#> Patient Gene
#> 1 1 A
#> 2 1 B
#> 3 2 A
#> 4 2 B
#> 5 2 C
#> 6 3 A
#> 7 3 C
setDT(df)
dcast(
data = df,
formula = Patient ~ paste("Gene", Gene),
fun.aggregate = function(x) sum(!is.na(x))
)
#> Using 'Gene' as value column. Use 'value.var' to override
#> Patient Gene A Gene B Gene C
#> 1: 1 1 1 0
#> 2: 2 1 1 1
#> 3: 3 1 0 1
创建于2022-10-06,reprex v2.0.2