我有一个数据帧如下:
+------+-----+----------+
| from | to | priority |
+------+-----+----------+
| 1 | 8 | 1 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 4 |
| 6 | 2 | 5 |
| 7 | 8 | 2 |
| 4 | 3 | 5 |
| 2 | 1 | 1 |
| 6 | 6 | 4 |
| 1 | 7 | 5 |
| 8 | 4 | 6 |
| 9 | 5 | 3 |
+------+-----+----------+
我的目标是根据 from 列对"to"列进行分组,但如果变量已经存在于任一列中,我不想进一步考虑它们 此外,总优先级将是所有组优先级的总和
因此,生成的数据帧如下所示:
+------+------+----------------+
| from | to | Total Priority |
+------+------+----------------+
| 1 | 8, 7 | 6 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 9 | 5 | 3 |
+------+------+----------------+
另外,我想在分组时保持与原始表相同的顺序
我能够使用"拆分堆栈形状"包折叠 from 列,如下所示
library(splitstackshape)
cSplit(df, 'to', sep = ','
+ , direction = 'long')[, .(to = toString(unique(to)))
+ , by = from]
这确实引入了重复值 我想知道是否有办法使用任何其他软件包获得所需的结果
使用最后注释中可重现显示DF
,按from
给出DF2
排序,然后遍历其行,删除任何重复的行。 我们在这里需要一个循环,因为每次删除都取决于先前的删除。 最后总结结果。
library(dplyr)
DF2 <- arrange(DF, from)
i <- 1
while(i <= nrow(DF2)) {
ix <- seq_len(i-1)
dup <- with(DF2, (to[i] %in% c(to[ix], from[ix])) | (from[i] %in% to[ix]))
if (dup) DF2 <- DF2[-i, ] else i <- i + 1
}
DF2 %>%
group_by(from) %>%
summarize(to = toString(to), priority = sum(priority)) %>%
ungroup
给:
# A tibble: 4 x 3
from to priority
<int> <chr> <int>
1 1 8, 7 6
2 2 6 1
3 3 4 1
4 9 5 3
注意
Lines <- "from | to | priority
1 | 8 | 1
2 | 6 | 1
3 | 4 | 1
4 | 5 | 3
5 | 6 | 4
6 | 2 | 5
7 | 8 | 2
4 | 3 | 5
2 | 1 | 1
6 | 6 | 4
1 | 7 | 5
8 | 4 | 6
9 | 5 | 3"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE)
目前还不清楚您如何尝试创建组,但这至少可以让您进入正确的球场:
library(tidyverse)
df <- tribble(~from, ~to, ~priority,
1,8,1,
2,6,1,
3,4,1,
4,5,3,
5,6,4,
6,2,5,
7,8,2,
4,3,5,
2,1,1,
6,6,4,
1,7,5,
8,4,6,
9,5,3)
df %>%
group_by(from) %>%
summarise(to = toString(to),
`Total Priority` = sum(priority, na.rm=T))
您的结果将是:
# A tibble: 9 x 3
from to `Total Priority`
<dbl> <chr> <dbl>
1 1 8, 7 6
2 2 6, 1 2
3 3 4 1
4 4 5, 3 8
5 5 6 4
6 6 2, 6 9
7 7 8 2
8 8 4 6
9 9 5 3