r-如何用逗号将tibble变成划界列变为整洁的形式

  • 本文关键字:何用逗 tibble r dplyr tidyverse
  • 更新时间 :
  • 英文 :


我有以下tibble:


df <- tibble::tribble(
  ~Sample_name, ~CRT,      ~SR,      ~`Bcells,DendriticCells,Macrophage`,
  "S1",          0.079,  0.592,      "0.077,0.483,0.555",
  "S2",          0.082,  0.549,      "0.075,0.268,0.120"
)
df
#> # A tibble: 2 × 4
#>   Sample_name   CRT    SR `Bcells,DendriticCells,Macrophage`
#>         <chr> <dbl> <dbl>                              <chr>
#> 1          S1 0.079 0.592                  0.077,0.483,0.555
#> 2          S2 0.082 0.549                  0.075,0.268,0.120

请注意,逗号中的第三列分开了。如何将df转换为此整洁形式:

Sample_name CRT   SR       Score     Celltype
S1          0.079 0.592    0.077     Bcells 
S1          0.079 0.592    0.483     DendriticCells
S1          0.079 0.592    0.555     Macrophage
S2          0.082 0.549    0.075     Bcells
S2          0.082 0.549    0.268     DendriticCells
S2          0.082 0.549    0.120     Macrophage

我们可以使用 separate

做到这一点
df %>%
    separate(col = `Bcells,DendriticCells,Macrophage`,
             into = strsplit('Bcells,DendriticCells,Macrophage', ',')[[1]],
             sep = ',') %>%
    gather(Celltype, score, Bcells:Macrophage)
# # A tibble: 6 × 5
#   Sample_name   CRT    SR       Celltype score
# <chr> <dbl> <dbl>          <chr> <chr>
# 1          S1 0.079 0.592         Bcells 0.077
# 2          S2 0.082 0.549         Bcells 0.075
# 3          S1 0.079 0.592 DendriticCells 0.483
# 4          S2 0.082 0.549 DendriticCells 0.268
# 5          S1 0.079 0.592     Macrophage 0.555
# 6          S2 0.082 0.549     Macrophage 0.120

没有硬编码:

cn <- colnames(df)[ncol(df)]
df %>%
    separate_(col = cn, into = strsplit(cn, ',')[[1]],  sep = ',') %>%
    gather_('Celltype', 'score', strsplit(cn, ',')[[1]])

另外我们可以使用tidyr

中的extract
library(tidyverse)
vars <- unlist(strsplit(names(df)[which(str_detect(toupper(names(df)),'BCELLS'))],','))
                                                   
df %>% 
tidyr::extract(`Bcells,DendriticCells,Macrophage`, into = all_of(vars), regex = '(.*)\,(.*)\,(.*)') %>% 
  pivot_longer(all_of(vars), names_to = 'Celltype', values_to = 'Score')

在2023-02-05创建了Reprex v2.0.2

# A tibble: 6 × 5
  Sample_name   CRT    SR Celltype       Score
  <chr>       <dbl> <dbl> <chr>          <chr>
1 S1          0.079 0.592 Bcells         0.077
2 S1          0.079 0.592 DendriticCells 0.483
3 S1          0.079 0.592 Macrophage     0.555
4 S2          0.082 0.549 Bcells         0.075
5 S2          0.082 0.549 DendriticCells 0.268
6 S2          0.082 0.549 Macrophage     0.120