我想清理一个R变量列,只获取物种名称。我想删除第2个"_&";。
这是我的桌子:
col1 | Col2 |
---|---|
Pelagodinium_beii_RCC149_SRR1300503_MMETSP1338c20 | 4 |
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 | 5 |
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP | 5 |
Vannella_sp_CB-2014 _DIVA3-518-3-11-1-6_SRR1296762_M | |
Florenciella_parvula_CCMP247_SRR1294437_METSP134 | 5 |
df$col1 <- sub("^([^_]+_[^_]+)_.*", "\1", df$col1, perl = TRUE)
df
col1 Col2
1 Pelagodinium_beii 4
2 Acanthoeca_10tr 5
3 Rhodosorus_marinus 5
4 Vannella_sp. 3
5 Florenciella_parvula 5
df
如下:
df <- read.table(
text =
'col1 Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20 4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP 5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M 3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134 5
',
header = TRUE
)
带有strsplit
:的选项
df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))
# col1 Col2
# 1 Pelagodinium_beii 4
# 2 Acanthoeca_10tr 5
# 3 Rhodosorus_marinus 5
# 4 Vannella_sp. 3
# 5 Florenciella_parvula 5
另一种方法是使用stringr
包中的word
:
library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1