r-如何用逗号将tibble变成划界列变为整洁的形式

我有以下tibble：

df <- tibble::tribble(
  ~Sample_name, ~CRT,      ~SR,      ~`Bcells,DendriticCells,Macrophage`,
  "S1",          0.079,  0.592,      "0.077,0.483,0.555",
  "S2",          0.082,  0.549,      "0.075,0.268,0.120"
)
df
#> # A tibble: 2 × 4
#>   Sample_name   CRT    SR `Bcells,DendriticCells,Macrophage`
#>         <chr> <dbl> <dbl>                              <chr>
#> 1          S1 0.079 0.592                  0.077,0.483,0.555
#> 2          S2 0.082 0.549                  0.075,0.268,0.120

请注意，逗号中的第三列分开了。如何将df转换为此整洁形式：

Sample_name CRT   SR       Score     Celltype
S1          0.079 0.592    0.077     Bcells 
S1          0.079 0.592    0.483     DendriticCells
S1          0.079 0.592    0.555     Macrophage
S2          0.082 0.549    0.075     Bcells
S2          0.082 0.549    0.268     DendriticCells
S2          0.082 0.549    0.120     Macrophage

我们可以使用 separate：

做到这一点

df %>%
    separate(col = `Bcells,DendriticCells,Macrophage`,
             into = strsplit('Bcells,DendriticCells,Macrophage', ',')[[1]],
             sep = ',') %>%
    gather(Celltype, score, Bcells:Macrophage)
# # A tibble: 6 × 5
#   Sample_name   CRT    SR       Celltype score
# <chr> <dbl> <dbl>          <chr> <chr>
# 1          S1 0.079 0.592         Bcells 0.077
# 2          S2 0.082 0.549         Bcells 0.075
# 3          S1 0.079 0.592 DendriticCells 0.483
# 4          S2 0.082 0.549 DendriticCells 0.268
# 5          S1 0.079 0.592     Macrophage 0.555
# 6          S2 0.082 0.549     Macrophage 0.120

没有硬编码：

cn <- colnames(df)[ncol(df)]
df %>%
    separate_(col = cn, into = strsplit(cn, ',')[[1]],  sep = ',') %>%
    gather_('Celltype', 'score', strsplit(cn, ',')[[1]])

另外我们可以使用tidyr

中的extract

library(tidyverse)
vars <- unlist(strsplit(names(df)[which(str_detect(toupper(names(df)),'BCELLS'))],','))
                                                   
df %>% 
tidyr::extract(`Bcells,DendriticCells,Macrophage`, into = all_of(vars), regex = '(.*)\,(.*)\,(.*)') %>% 
  pivot_longer(all_of(vars), names_to = 'Celltype', values_to = 'Score')

^{在2023-02-05创建了Reprex v2.0.2}

# A tibble: 6 × 5
  Sample_name   CRT    SR Celltype       Score
  <chr>       <dbl> <dbl> <chr>          <chr>
1 S1          0.079 0.592 Bcells         0.077
2 S1          0.079 0.592 DendriticCells 0.483
3 S1          0.079 0.592 Macrophage     0.555
4 S2          0.082 0.549 Bcells         0.075
5 S2          0.082 0.549 DendriticCells 0.268
6 S2          0.082 0.549 Macrophage     0.120

相关内容

最新更新

热门标签：