删除R列中的部分变量名



我想清理一个R变量列,只获取物种名称。我想删除第2个"_&";。

这是我的桌子:

col1 Col2
Pelagodinium_beii_RCC149_SRR1300503_MMETSP1338c20 4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP 5
Vannella_sp_CB-2014 _DIVA3-518-3-11-1-6_SRR1296762_M
Florenciella_parvula_CCMP247_SRR1294437_METSP134 5
df$col1 <- sub("^([^_]+_[^_]+)_.*", "\1", df$col1, perl = TRUE)
df
col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

df如下:

df <- read.table(
text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
header = TRUE
)

带有strsplit:的选项

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))

# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

另一种方法是使用stringr包中的word

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1

相关内容

  • 没有找到相关文章

最新更新