在R中的数据表中总结特定单词

  • 本文关键字:单词 数据表 r
  • 更新时间 :
  • 英文 :

df <- data.frame(
"Domain" = c("Euka"),
"Kingdom" = c("An","Plan"),
"Division" = c("20181121","20181128","20181203"),
"Species" = c("20181115_AG25_MAGH_50_A05_CGT.TXT","20181122_AG25_MAGH_50_C05_CGT.ARR",
"20181115_AG25_MAGH_50_G05_CGT.TXT","20181124_AG25_MAGH_50_G45_CGT.TXT",
"20181204_AG25_MAGH_50_G05_CGT.ARR","20181205_AG25_MAGH_50_G45_CGT.TXT",
"20181207_AG25_MAGH_50_T05_CGT.ARR","20181215_AG25_MAGH_50_F45_CGT.TXT",
"20181223_AG25_MAGH_50_R07_CGT.GGI","20181225_TW77_MAGH_33_L06_CGT.ARR",
"20181226_TW77_MAGH_33_S07_CGT.ARR","20181227_TW77_MAGH_33_C06_CGT.TXT")
)

我想总结一下

Division
Total_TXT
Total_ARR
Total_GGI

这里有一个tidyverse选项,我们使用count来获得每组的总数,然后我们可以使用pivot_wider将其转换为宽格式。

library(tidyverse)
df %>% 
group_by(gr = Division) %>% 
count(Division = str_replace_all(Species, '.*\.', '')) %>% 
pivot_wider(names_from = "gr", values_from = "n", values_fill = 0) %>% 
mutate(Division = paste0("Total_", Division))

输出

Division  `20181121` `20181128` `20181203`
<chr>          <int>      <int>      <int>
1 Total_ARR          2          3          0
2 Total_TXT          2          1          3
3 Total_GGI          0          0          1

或者这里有一个data.table选项:

library(data.table)
df <-
setDT(df)[, .N, by = .(cn = Division, Division = str_replace_all(Species, '.*\.', ''))]
dcast(df,
paste0("Total_", Division) ~ cn,
value.var = "N",
fill = 0)

我们需要从Species中提取最后三个字符:

x <- nchar(df$Species)
rowlbl <- substr(df$Species, x-2, x)
table(rowlbl, df$Division)
# rowlbl 20181121 20181128 20181203
#    ARR        2        3        0
#    GGI        0        0        1
#    TXT        2        1        3

Base R一行

table(sub('.*\.', '', df$Species), df$Division)
#     20181121 20181128 20181203
#  ARR        2        3        0
#  GGI        0        0        1
#  TXT        2        1        3

解释:

sub删除所有内容,直到最后一个返回的"."

sub('.*\.', '', df$Species)
#[1] "TXT" "ARR" "TXT" "TXT" "ARR" "TXT" "ARR" "TXT" "GGI" "ARR" "ARR" "TXT"

然后将其用于具有Division值的table中。

对于非正则表达式方法,

sub也可以替换为tools::file_ext

table(tools::file_ext(df$Species), df$Division)

最新更新