我有以下tibble:
df <- tibble::tribble(
~Sample_name, ~CRT, ~SR, ~`Bcells,DendriticCells,Macrophage`,
"S1", 0.079, 0.592, "0.077,0.483,0.555",
"S2", 0.082, 0.549, "0.075,0.268,0.120"
)
df
#> # A tibble: 2 × 4
#> Sample_name CRT SR `Bcells,DendriticCells,Macrophage`
#> <chr> <dbl> <dbl> <chr>
#> 1 S1 0.079 0.592 0.077,0.483,0.555
#> 2 S2 0.082 0.549 0.075,0.268,0.120
请注意,逗号中的第三列分开了。如何将df
转换为此整洁形式:
Sample_name CRT SR Score Celltype
S1 0.079 0.592 0.077 Bcells
S1 0.079 0.592 0.483 DendriticCells
S1 0.079 0.592 0.555 Macrophage
S2 0.082 0.549 0.075 Bcells
S2 0.082 0.549 0.268 DendriticCells
S2 0.082 0.549 0.120 Macrophage
我们可以使用 separate
:
df %>%
separate(col = `Bcells,DendriticCells,Macrophage`,
into = strsplit('Bcells,DendriticCells,Macrophage', ',')[[1]],
sep = ',') %>%
gather(Celltype, score, Bcells:Macrophage)
# # A tibble: 6 × 5
# Sample_name CRT SR Celltype score
# <chr> <dbl> <dbl> <chr> <chr>
# 1 S1 0.079 0.592 Bcells 0.077
# 2 S2 0.082 0.549 Bcells 0.075
# 3 S1 0.079 0.592 DendriticCells 0.483
# 4 S2 0.082 0.549 DendriticCells 0.268
# 5 S1 0.079 0.592 Macrophage 0.555
# 6 S2 0.082 0.549 Macrophage 0.120
没有硬编码:
cn <- colnames(df)[ncol(df)]
df %>%
separate_(col = cn, into = strsplit(cn, ',')[[1]], sep = ',') %>%
gather_('Celltype', 'score', strsplit(cn, ',')[[1]])
另外我们可以使用tidyr
extract
library(tidyverse)
vars <- unlist(strsplit(names(df)[which(str_detect(toupper(names(df)),'BCELLS'))],','))
df %>%
tidyr::extract(`Bcells,DendriticCells,Macrophage`, into = all_of(vars), regex = '(.*)\,(.*)\,(.*)') %>%
pivot_longer(all_of(vars), names_to = 'Celltype', values_to = 'Score')
在2023-02-05创建了Reprex v2.0.2
# A tibble: 6 × 5
Sample_name CRT SR Celltype Score
<chr> <dbl> <dbl> <chr> <chr>
1 S1 0.079 0.592 Bcells 0.077
2 S1 0.079 0.592 DendriticCells 0.483
3 S1 0.079 0.592 Macrophage 0.555
4 S2 0.082 0.549 Bcells 0.075
5 S2 0.082 0.549 DendriticCells 0.268
6 S2 0.082 0.549 Macrophage 0.120