A y1 ccc A y2 >cctA A B y1 aaa B y4 aat
数据已更新
我有一个示例数据集
目标您可以通过对每组使用combn
来实现这一点。
library(dplyr)
library(tidyr)
df %>%
group_by(Target) %>%
summarise(Start = combn(Start, 2, function(x)
list(setNames(x, c('start', 'end')))),
Sequence = combn(sequence, 2, toString), .groups = 'drop') %>%
unnest_wider(Start)
# Target start end Sequence
# <chr> <chr> <chr> <chr>
#1 A y1 y2 ccc, cct
#2 A y1 y3 ccc, aag
#3 A y2 y3 cct, aag
#4 B y1 y4 aaa, aat
这里是另一种不使用combn()
的tidyverse
方法。
group_by(Target, Start)
,使得任何具有相同Target
和Start
的序列都可以折叠成一行- 删除
group_by()
中的Start
列 - 将
Start
列更改为数字,这样我们就可以直接比较Start
的值 - 创建一个包含大于自身的
Start
值的Start2
列,提取相应的sequence
字符串并存储在sequence2
列中 - 基于
Start2
和sequence2
展开数据帧(因为sapply
每行会有多个输出( group_by(Target, Start, Start2)
,这样我们就可以用sequence2
来paste
sequence
library(tidyverse)
df %>%
group_by(Target, Start) %>%
summarize(sequence = paste0(sequence, collapse = ","), .groups = "drop_last") %>%
mutate(Start_num = as.numeric(str_extract(Start, "\d+")),
Start2 = sapply(Start_num, function(x) Start[which(Start_num > Start_num[x])]),
sequence2 = sapply(Start_num, function(x) sequence[which(Start_num > Start_num[x])])) %>%
unnest(cols = c(Start2, sequence2)) %>%
group_by(Target, Start, Start2) %>%
summarize(sequence = paste0(c(sequence, sequence2), collapse = ","), .groups = "drop")
# A tibble: 4 × 4
Target Start Start2 sequence
<chr> <chr> <chr> <chr>
1 A y1 y2 ccc,cct
2 A y1 y3 ccc,aag,act
3 A y2 y3 cct,aag,act
4 B y1 y4 aaa,aat