我有一个数据帧,由 12 列组成,具有不同的参与者,位于前 5 列中。它看起来像这样:
> top_5
4 5 8 9 11 12 15 16 19 20 22 23
[1,] "Nia" "Hung" "Hanaaa" "Ramziyya" "Marissa" "Jaelyn" "Shyanne" "Jaabir" "Dionicio" "Nia" "Shyanne" "Roger"
[2,] "Razeena" "Husni" "Bradly" "Marissa" "Bradly" "Muhsin" "Razeena" "Dionicio" "Magnus" "Kelsey" "Nia" "Schyler"
[3,] "Shyanne" "Schyler" "Necko" "Johannah" "Tatiana" "Glenn" "Nia" "Jaelyn" "Shyanne" "Hanaaa" "Mildred" "German"
[4,] "Schyler" "German" "Hung" "Lubaaba" "Johannah" "Magnus" "Dionicio" "German" "German" "Razeena" "Dionicio" "Jaabir"
[5,] "Husni" "Necko" "Razeena" "Afeefa" "Schyler" "Dionicio" "Jaabir" "Roger" "Johannah" "Remy" "Jaabir" "Jaelyn"
(并且可以使用它重新创建(:
structure(c("Nia", "Razeena", "Shyanne", "Schyler", "Husni",
"Hung", "Husni", "Schyler", "German", "Necko", "Hanaaa", "Bradly",
"Necko", "Hung", "Razeena", "Ramziyya", "Marissa", "Johannah",
"Lubaaba", "Afeefa", "Marissa", "Bradly", "Tatiana", "Johannah",
"Schyler", "Jaelyn", "Muhsin", "Glenn", "Magnus", "Dionicio",
"Shyanne", "Razeena", "Nia", "Dionicio", "Jaabir", "Jaabir",
"Dionicio", "Jaelyn", "German", "Roger", "Dionicio", "Magnus",
"Shyanne", "German", "Johannah", "Nia", "Kelsey", "Hanaaa", "Razeena",
"Remy", "Shyanne", "Nia", "Mildred", "Dionicio", "Jaabir", "Roger",
"Schyler", "German", "Jaabir", "Jaelyn"), .Dim = c(5L, 12L), .Dimnames = list(
NULL, c("4", "5", "8", "9", "11", "12", "15", "16", "19",
"20", "22", "23")))
现在,如果参与者在第一行,则意味着他们在该列中排名第一(因此对于第一列,"Nia"是第一列,"Razeena"是第二,依此类推(。 排名第一名值5分,第二名值4分,依此类推。现在我想为矩阵中的每个参与者计算她/他的分数。
我的目标是进入总排名前五。我该怎么做?
这是一个"转换为长然后按组汇总"的方法,类似于 M--的答案,但使用 data.table
library(data.table)
df <- as.data.table(top_5)[, points := .N:1]
total_points <- melt(df, 'points')[, .(points = sum(points)), value]
setorder(total_points, -points)
head(total_points, 5)
# value points
# 1: Nia 17
# 2: Shyanne 16
# 3: Dionicio 14
# 4: Razeena 11
# 5: Schyler 10
或者一个与 akrun 非常相似的想法,只是用tapply
代替sapply
+ split
out <- sort(tapply(c(6 - row(top_5)), c(top_5), sum), decreasing = TRUE)
head(out, 5)
# Nia Shyanne Dionicio Razeena Schyler
# 17 16 14 11 10
一种选择是将与矩阵值反转的行索引split
到一个list
中,并通过循环遍历list
来获取每个list
元素的sum
(sapply
(
out <- sapply(split(row(top_5)[nrow(top_5):1, ], top_5), sum)
out
#Afeefa Bradly Dionicio German Glenn Hanaaa Hung Husni Jaabir Jaelyn Johannah Kelsey Lubaaba Magnus Marissa Mildred Muhsin
# 1 8 14 9 3 8 7 5 9 9 6 4 2 6 9 3 4
# Necko Nia Ramziyya Razeena Remy Roger Schyler Shyanne Tatiana
# 4 17 5 11 1 6 10 16 3
head(out[order(-out)], 5)
# Nia Shyanne Dionicio Razeena Schyler
# 17 16 14 11 10
或者另一种选择是rowsum
rowsum(c(row(top_5)[nrow(top_5):1, ]), group = c(top_5))
使用 tidyverse
函数:
library(tidyr)
library(dplyr)
top_5 %>%
as.data.frame %>%
head(.,5) %>%
mutate(rank = nrow(.):1) %>%
pivot_longer(., -c(rank), values_to = "name", names_to = "col") %>%
group_by(name) %>%
summarise_at(vars(rank), list(points = sum))
#> # A tibble: 26 x 2
#> name points
#> <fct> <int>
#> 1 Husni 5
#> 2 Nia 17
#> 3 Razeena 11
#> 4 Schyler 10
#> 5 Shyanne 16
#> 6 German 9
#> 7 Hung 7
#> 8 Necko 4
#> 9 Bradly 8
#> 10 Hanaaa 8
#> # ... with 16 more rows