r语言 - 如何根据矩阵位置计算总分?



我有一个数据帧,由 12 列组成,具有不同的参与者,位于前 5 列中。它看起来像这样:

> top_5
     4         5         8         9          11         12         15         16         19         20        22         23       
[1,] "Nia"     "Hung"    "Hanaaa"  "Ramziyya" "Marissa"  "Jaelyn"   "Shyanne"  "Jaabir"   "Dionicio" "Nia"     "Shyanne"  "Roger"  
[2,] "Razeena" "Husni"   "Bradly"  "Marissa"  "Bradly"   "Muhsin"   "Razeena"  "Dionicio" "Magnus"   "Kelsey"  "Nia"      "Schyler"
[3,] "Shyanne" "Schyler" "Necko"   "Johannah" "Tatiana"  "Glenn"    "Nia"      "Jaelyn"   "Shyanne"  "Hanaaa"  "Mildred"  "German" 
[4,] "Schyler" "German"  "Hung"    "Lubaaba"  "Johannah" "Magnus"   "Dionicio" "German"   "German"   "Razeena" "Dionicio" "Jaabir" 
[5,] "Husni"   "Necko"   "Razeena" "Afeefa"   "Schyler"  "Dionicio" "Jaabir"   "Roger"    "Johannah" "Remy"    "Jaabir"   "Jaelyn" 

(并且可以使用它重新创建(:

structure(c("Nia", "Razeena", "Shyanne", "Schyler", "Husni", 
"Hung", "Husni", "Schyler", "German", "Necko", "Hanaaa", "Bradly", 
"Necko", "Hung", "Razeena", "Ramziyya", "Marissa", "Johannah", 
"Lubaaba", "Afeefa", "Marissa", "Bradly", "Tatiana", "Johannah", 
"Schyler", "Jaelyn", "Muhsin", "Glenn", "Magnus", "Dionicio", 
"Shyanne", "Razeena", "Nia", "Dionicio", "Jaabir", "Jaabir", 
"Dionicio", "Jaelyn", "German", "Roger", "Dionicio", "Magnus", 
"Shyanne", "German", "Johannah", "Nia", "Kelsey", "Hanaaa", "Razeena", 
"Remy", "Shyanne", "Nia", "Mildred", "Dionicio", "Jaabir", "Roger", 
"Schyler", "German", "Jaabir", "Jaelyn"), .Dim = c(5L, 12L), .Dimnames = list(
    NULL, c("4", "5", "8", "9", "11", "12", "15", "16", "19", 
    "20", "22", "23")))

现在,如果参与者在第一行,则意味着他们在该列中排名第一(因此对于第一列,"Nia"是第一列,"Razeena"是第二,依此类推(。 排名第一名值5分,第二名值4分,依此类推。现在我想为矩阵中的每个参与者计算她/他的分数。
我的目标是进入总排名前五。我该怎么做?

这是一个"转换为长然后按组汇总"的方法,类似于 M--的答案,但使用 data.table

library(data.table)
df <- as.data.table(top_5)[, points := .N:1]
total_points <- melt(df, 'points')[, .(points = sum(points)), value]
setorder(total_points, -points)
head(total_points, 5)
#       value points
# 1:      Nia     17
# 2:  Shyanne     16
# 3: Dionicio     14
# 4:  Razeena     11
# 5:  Schyler     10

或者一个与 akrun 非常相似的想法,只是用tapply代替sapply + split

out <- sort(tapply(c(6 - row(top_5)), c(top_5), sum), decreasing = TRUE)
head(out, 5)
# Nia  Shyanne Dionicio  Razeena  Schyler 
#  17       16       14       11       10 

一种选择是将与矩阵值反转的行索引split到一个list中,并通过循环遍历list来获取每个list元素的sum(sapply (

out <- sapply(split(row(top_5)[nrow(top_5):1, ], top_5), sum)
out
#Afeefa   Bradly Dionicio   German    Glenn   Hanaaa     Hung    Husni   Jaabir   Jaelyn Johannah   Kelsey  Lubaaba   Magnus  Marissa  Mildred   Muhsin 
#       1        8       14        9        3        8        7        5        9        9        6        4        2        6        9        3        4 
#   Necko      Nia Ramziyya  Razeena     Remy    Roger  Schyler  Shyanne  Tatiana 
#       4       17        5       11        1        6       10       16        3 

head(out[order(-out)], 5)
# Nia  Shyanne Dionicio  Razeena  Schyler 
#  17       16       14       11       10 

或者另一种选择是rowsum

rowsum(c(row(top_5)[nrow(top_5):1, ]), group = c(top_5))

使用 tidyverse 函数:

library(tidyr)
library(dplyr)
top_5 %>% 
  as.data.frame %>% 
  head(.,5) %>%
  mutate(rank = nrow(.):1) %>% 
  pivot_longer(., -c(rank), values_to = "name", names_to = "col") %>% 
  group_by(name) %>% 
  summarise_at(vars(rank), list(points = sum))
#> # A tibble: 26 x 2
#>    name   points
#>    <fct>   <int>
#>  1 Husni       5
#>  2 Nia        17
#>  3 Razeena    11
#>  4 Schyler    10
#>  5 Shyanne    16
#>  6 German      9
#>  7 Hung        7
#>  8 Necko       4
#>  9 Bradly      8
#> 10 Hanaaa      8
#> # ... with 16 more rows

最新更新