我有两个数据帧。第一个只列出每个学校/团队一次,如下所示:
classA <- data.frame(School=c("Omaha South", "Millard North", "Elkhorn"))
另一个数据帧是整个赛季的篮球比分表,您可以在同一列中多次列出学校/球队:
scores <- data.frame('Away Score'=c(60,84,48,72),
'Away Team'=c("Omaha South", "Millard North", "Elkhorn","Elkhorn"),
'Home Score'=c(88,40,38,62),
'Home Team'=c("Elkhorn", "Omaha South", "Millard North","Omaha South"))
我的目标是创建一个名为classA$"Away PPG"的新列,该列平均第一个数据框中每所学校的所有"客场得分"。因此,对于 Elkhorn,新的 Class A 列将是 60 (48+72(/2。
我遇到卡住的地方之一是两个 df 有不同的列名称要匹配,我还没有找到如何处理这方面。
我之前在一个有点相关的问题上得到了帮助,我正在寻找计数而不是平均值,但无法弄清楚如何修改它以适用于这个问题。计数问题的解决方案如下所示:
df2 %>%
right_join(df1, by = c('Winner' = 'School')) %>%
na.omit() %>%
count(Winner, name = "wins") %>%
right_join(df1, c('Winner' = 'School')) %>%
mutate(wins = replace(wins, is.na(wins), 0))
我们可以将classA
与scores
连接起来,然后为每个School
取mean
Away.Score
。
library(dplyr)
classA %>%
left_join(scores, by = c('School' = 'Away.Team')) %>%
group_by(School) %>%
summarise(AwayScore = mean(Away.Score, na.rm = TRUE))
# A tibble: 3 x 2
# School AwayScore
# <fct> <dbl>
#1 Elkhorn 60
#2 Millard North 84
#3 Omaha South 60
在碱基 R 中类似
aggregate(Away.Score~School,
merge(classA, scores, by.x = 'School', by.y = 'Away.Team'),
mean, na.rm = TRUE)