r语言 - 折叠行,将所有唯一变量值保留在一列中,将所有值保留在另一列中



我需要将这些数据分组在列ESVId中所有匹配的值,同时保留列match中的每个唯一值;以及Form列中的所有值与match列中的每个值相关联(可能有重复!)。

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", "ESV_000004", 
"ESV_000004", "ESV_000005", "ESV_000005", "ESV_000005", "ESV_000005", 
"ESV_000005", "ESV_000005", "ESV_000005", "ESV_000006", "ESV_000006", 
"ESV_000006", "ESV_000007"), MT_species = c(1, 1, 1, 1, 1, 1, 
1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus", "Achillea millefolium", "Artemisia absinthium", 
"Artemisia ludoviciana", "Artemisia frigida", "Artemisia campestris", 
"Artemisia tridentata", "Artemisia tilesii", "Rubus arcticus", 
"Fragaria vesca", "Rosa acicularis", "Fragaria virginiana", "Rosa woodsii", 
"Rosa arkansana", "Rubus ursinus", "Poa pratensis", "Vahlodea atropurpurea", 
"Alopecurus magellanicus", "Prunus virginiana"), Form = c("Conifer", 
NA, "Forb", NA, "Forb", "Sub-Shrub", "Forb", "Shrub", NA, NA, 
"Forb", "Shrub", "Forb", "Shrub", NA, NA, "Graminoid", NA, NA, 
"Shrub")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

当我尝试

MTTaxa_funct <- funct_esvs %>%
group_by(ESVId) %>%
summarise_all(funs(paste(unique(match, Form), collapse= " OR ")))%>%
dplyr::select(ESVId, match, Form) %>% 
ungroup()

它填充列Formmatch相同,这不是我想要的。我还需要在列Form中保留任何NA值。理想情况下,输出结果应该是这样的:

structure(list(ESVId = c("ESV_000001", "ESV_000004", "ESV_000005", 
"ESV_000006", "ESV_000007"), match = c("Pseudotsuga menziesii", 
"Artemisia dracunculus OR Achillea millefolium OR Artemisia absinthium OR Artemisia ludoviciana OR Artemisia frigida OR Artemisia campestris OR Artemisia tridentata OR Artemisia tilesii", 
"Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fragaria virginiana OR Rosa woodsii OR Rosa arkansana OR Rubus ursinus", 
"Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magellanicus", 
"Prunus virginiana"
), Form = c("Conifer", "NA OR Forb OR NA OR Forb OR Sub-Shrub OR Forb OR Shrub OR NA", 
"NA OR Forb OR Shrub OR Forb OR Shrub OR NA OR NA", 
"Graminoid OR NA OR NA", 
"Shrub"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

我不确定这是否是您所需要的,因为您预期的输出中有源数据中不存在的元素,但也许这?

quux %>%
group_by(ESVId) %>%
summarize(
match = paste(unique(match), collapse = " OR "), 
Form = paste(Form, collapse = " OR ")
)
# # A tibble: 5 × 3
#   ESVId      match                                                         Form 
#   <chr>      <chr>                                                         <chr>
# 1 ESV_000001 Pseudotsuga menziesii                                         Coni…
# 2 ESV_000004 Artemisia dracunculus OR Achillea millefolium OR Artemisia a… NA O…
# 3 ESV_000005 Rubus arcticus OR Fragaria vesca OR Rosa acicularis OR Fraga… NA O…
# 4 ESV_000006 Poa pratensis OR Vahlodea atropurpurea OR Alopecurus magella… Gram…
# 5 ESV_000007 Prunus virginiana                                             Shrub

最新更新