r语言 - 按样品列名平均质谱峰计数

  • 本文关键字:r语言 r dataframe
  • 更新时间 :
  • 英文 :


希望这是直截了当的,我只是想得太辛苦了。我有一个质谱仪(MS)的峰计数矩阵,其中峰是行,列是样本名称。样本位置有几个采样点,我想在位置内添加站点之间的计数。

例如,一个具有三个重复的样本被标识为";S19S_0010_Sed_Field_ICR.D_p2";S19S_0010_Sed_Field_ICR.M_p2"one_answers";S19S_0010_Sed_Field_ICR.U_p2"在相同的位置,但下游(D),中游(M)和上游(U)。前两个样本各有一个特定峰的计数,所以我想合并三个样本,只说"s19s_0010_sed_field_icr .all_p2";用两次波长计数。示例数据集:

> dput(data.sed.ex)
structure(list(S19S_0004_Sed_Field_ICR.M_p15 = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0), S19S_0006_Sed_Field_ICR.D_p2 = c(0, 0, 0, 
0, 0, 0, 1, 1, 0, 0), S19S_0006_Sed_Field_ICR.M_p2 = c(0, 0, 
0, 0, 0, 0, 1, 0, 0, 0), S19S_0006_Sed_Field_ICR.U_p2 = c(0, 
0, 0, 0, 0, 0, 1, 1, 0, 0), S19S_0008_Sed_Field_ICR.M_p15 = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0), S19S_0009_Sed_Field_ICR.M_p2 = c(0, 
0, 1, 0, 0, 0, 1, 0, 0, 0), S19S_0009_Sed_Field_ICR.U_p2 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.D_p15 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.M_p15 = c(0, 
0, 0, 0, 0, 0, 1, 0, 0, 0), S19S_0010_Sed_Field_ICR.U_p15 = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c("200.002276", "200.015107", 
"200.0564158", "200.0565393", "200.0578394", "200.0677581", "200.092796", 
"200.1291723", "200.1292836", "200.9238455"), class = "data.frame")

TIA

也许使用较长的格式会有所帮助。在此格式中,您可以按组进行汇总,例如样本,或样本和位置,使用sum,mean,sd等。

希望有帮助,

转换为长格式

## dd is the `data.sed.ex` object above
library(tidyverse)                                                                                                                                                           
ddLong <- dd %>%
rownames_to_column(var = "peak") %>%
pivot_longer(cols = matches("^S")) %>%
mutate(sample = gsub("(.*)\.(.*)", "\1", name),           ## pull sample info                                                                                          
location = gsub("(.*)\.([DMU])_(.*)", "\2", name), ## pull D M U                                                                                                
p = gsub("(.*)\.([DMU])_(p.*)", "\3", name),       ## get p2, p15                                                                                               
peak = as.numeric(peak))             ## coerce peak to numeric                                                                                                    
ddLong
#> # A tibble: 100 × 6
#>     peak name                          value sample               location p    
#>    <dbl> <chr>                         <dbl> <chr>                <chr>    <chr>
#>  1  200. S19S_0004_Sed_Field_ICR.M_p15     0 S19S_0004_Sed_Field… M        p15  
#>  2  200. S19S_0006_Sed_Field_ICR.D_p2      0 S19S_0006_Sed_Field… D        p2   
#>  3  200. S19S_0006_Sed_Field_ICR.M_p2      0 S19S_0006_Sed_Field… M        p2   
#>  4  200. S19S_0006_Sed_Field_ICR.U_p2      0 S19S_0006_Sed_Field… U        p2   
#>  5  200. S19S_0008_Sed_Field_ICR.M_p15     0 S19S_0008_Sed_Field… M        p15  
#>  6  200. S19S_0009_Sed_Field_ICR.M_p2      0 S19S_0009_Sed_Field… M        p2   
#>  7  200. S19S_0009_Sed_Field_ICR.U_p2      0 S19S_0009_Sed_Field… U        p2   
#>  8  200. S19S_0010_Sed_Field_ICR.D_p15     0 S19S_0010_Sed_Field… D        p15  
#>  9  200. S19S_0010_Sed_Field_ICR.M_p15     0 S19S_0010_Sed_Field… M        p15  
#> 10  200. S19S_0010_Sed_Field_ICR.U_p15     0 S19S_0010_Sed_Field… U        p15  
#> # … with 90 more rows

按一个或多个组进行汇总

## summarise using group_by + verbs                                                                                                                                          
ddLong %>%                                                                                                                                                                   
group_by(sample, location) %>%                                                                                                                                           
summarise(n = n(),                                                                                                                                                       
sum.value = sum(value),                                                                                                                                        
mean.peak = mean(peak))                                                                                                                                        
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 10 × 5
#> # Groups:   sample [5]
#>    sample                  location     n sum.value mean.peak
#>    <chr>                   <chr>    <int>     <dbl>     <dbl>
#>  1 S19S_0004_Sed_Field_ICR M           10         0      200.
#>  2 S19S_0006_Sed_Field_ICR D           10         2      200.
#>  3 S19S_0006_Sed_Field_ICR M           10         1      200.
#>  4 S19S_0006_Sed_Field_ICR U           10         2      200.
#>  5 S19S_0008_Sed_Field_ICR M           10         1      200.
#>  6 S19S_0009_Sed_Field_ICR M           10         2      200.
#>  7 S19S_0009_Sed_Field_ICR U           10         1      200.
#>  8 S19S_0010_Sed_Field_ICR D           10         1      200.
#>  9 S19S_0010_Sed_Field_ICR M           10         1      200.
#> 10 S19S_0010_Sed_Field_ICR U           10         0      200.
                              
ddLong %>%                                                                                                                                                                   
group_by(sample, p) %>%                                             
summarise(n = n(),                                                                                                                                                       
sum.value = sum(value),                                                                                                                                        
mean.peak = mean(peak))                                                                                                                                        
#> `summarise()` has grouped output by 'sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 5 × 5
#> # Groups:   sample [5]
#>   sample                  p         n sum.value mean.peak
#>   <chr>                   <chr> <int>     <dbl>     <dbl>
#> 1 S19S_0004_Sed_Field_ICR p15      10         0      200.
#> 2 S19S_0006_Sed_Field_ICR p2       30         5      200.
#> 3 S19S_0008_Sed_Field_ICR p15      10         1      200.
#> 4 S19S_0009_Sed_Field_ICR p2       20         3      200.
#> 5 S19S_0010_Sed_Field_ICR p15      30         2      200.

最新更新