删除R列中的部分变量名

我想清理一个R变量列，只获取物种名称。我想删除第2个"_&"；。

这是我的桌子：

col1	Col2
Pelagodinium_beii_RCC149_SRR1300503_MMETSP1338c20	4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1	5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP	5
Vannella_sp_CB-2014 _DIVA3-518-3-11-1-6_SRR1296762_M
Florenciella_parvula_CCMP247_SRR1294437_METSP134	5

df$col1 <- sub("^([^_]+_[^_]+)_.*", "\1", df$col1, perl = TRUE)
df

col1 Col2
1    Pelagodinium_beii    4
2      Acanthoeca_10tr    5
3   Rhodosorus_marinus    5
4         Vannella_sp.    3
5 Florenciella_parvula    5

df如下：

df <- read.table(
text =
'col1   Col2
Pelagodinium_beii_RCC1491_SRR1300503_MMETSP1338c20  4
Acanthoeca_10tr_SRR1294413_MMETSP0105_2c10003_g1_i1 5
Rhodosorus_marinus_UTEX-LB-2760_SRR1296985_MMETSP   5
Vannella_sp._CB-2014_DIVA3-518-3-11-1-6_SRR1296762_M    3
Florenciella_parvula_CCMP2471_SRR1294437_MMETSP134  5
',
header = TRUE
)

带有strsplit:的选项

df$col1 <- sapply(df$col1, function(i) paste0(strsplit(i, "_")[[1]][1:2], collapse = '_'))

# col1 Col2
# 1    Pelagodinium_beii    4
# 2      Acanthoeca_10tr    5
# 3   Rhodosorus_marinus    5
# 4         Vannella_sp.    3
# 5 Florenciella_parvula    5

另一种方法是使用stringr包中的word：

library(stringr)
word(df$col1, 1, 2, sep = "_") -> df$col1

相关内容

最新更新

热门标签：