我有一个包含多场足球比赛的大数据集。现在的比赛形式很广泛,我想通过比赛来计算连胜、平局和失利以及球队。
在这种情况下,我有以下变量:
- Home_team:一个带有国家名称的字符串(例如:英格兰、西班牙等(
- Away_team:包含国家名称的字符串(例如:法国、德国等(
- 结果:有三个类别的字符串(HW:主场获胜,AW:客场获胜,D:平局(
举个简单的例子,我的数据看起来像这样:
Home_team <- c("Peru","France","England","Senegal", "Chile", "Colombia","France","Spain","Colombia", "Angola", "Ecuador", "France",
"Peru")
Away_team <- c("Brasil","Germany","Togo","Egypt", "Ecuador", "Argentina","Netherlands","Burkina Faso","New Zealand", "Venezuela", "Portugal", "Canada",
"United States")
Results <- c("HW","HW","AW","D","AW","HW","HW","AW","HW","D","D","HW","D")
df_example <- data.frame(Home_team,Away_team,Results)
df_example
因此,在这个例子中,发生了以下事情:
- 秘鲁队(第1排(在与美国队的比赛中取得了1连胜,并打平了比赛
- 法国队在与加拿大队的比赛中取得了2连胜
- 哥伦比亚队也将进入2连胜
- 厄瓜多尔队输给了智利队(1连败(,然后在下一场比赛中战平了葡萄牙队
我想一个更简单的方法是把所有东西都放在长格式中;获胜"损失";以及";绘制"。每次连胜停止,计数就会重新开始。但我不确定这是否是最好的方法。
总的来说,我想知道连胜(输球甚至打平(是否会对下一场比赛的结果产生影响。
如有任何帮助,我们将不胜感激。
这里有一种方法可以尝试。
首先,把数据放在长格式中。使用case_when
,你可以确定每一场比赛的结果(例如,主队在"HW"比赛中获得"胜利",而客场球队获得"失败"(。
对于每个团队,可以使用rleid
和data.table
对条纹进行分组。每当结果发生变化时,它就会进入一个新的连胜。
然后,你可以计算出一支球队的连胜次数和结果。这将是给定连胜、球队和结果的row_number()
。
最后,如果需要的话,你可以把它放回更宽的形式。新列显示了主队和客场球队在当前比赛中的连胜纪录(比赛次数和结果(。
library(tidyverse)
library(data.table)
df_example %>%
mutate(Game = row_number()) %>%
pivot_longer(cols = c(Home_team, Away_team), names_to = "Location", values_to = "Team") %>%
mutate(Outcome = case_when(
Results == "HW" & Location == "Home_team" ~ "Win",
Results == "HW" & Location == "Away_team" ~ "Loss",
Results == "AW" & Location == "Home_team" ~ "Loss",
Results == "AW" & Location == "Away_team" ~ "Win",
Results == "D" ~ "Draw",
TRUE ~ NA_character_
)) %>%
group_by(Team) %>%
mutate(Change = rleid(Outcome)) %>%
group_by(Change, .add = T) %>%
mutate(Streak = row_number()) %>%
group_by(Team) %>%
mutate(Last = paste(lag(Streak, default = 0), lag(Outcome, default = "-"))) %>%
pivot_wider(id_cols = c(Game, Results), names_from = Location, values_from = c(Team, Last))
结果
Game Results Team_Home_team Team_Away_team Last_Home_team Last_Away_team
<int> <chr> <chr> <chr> <chr> <chr>
1 1 HW Peru Brasil 0 - 0 -
2 2 HW France Germany 0 - 0 -
3 3 AW England Togo 0 - 0 -
4 4 D Senegal Egypt 0 - 0 -
5 5 AW Chile Ecuador 0 - 0 -
6 6 HW Colombia Argentina 0 - 0 -
7 7 HW France Netherlands 1 Win 0 -
8 8 AW Spain Burkina Faso 0 - 0 -
9 9 HW Colombia New Zealand 1 Win 0 -
10 10 D Angola Venezuela 0 - 0 -
11 11 D Ecuador Portugal 1 Win 0 -
12 12 HW France Canada 2 Win 0 -
13 13 D Peru United States 1 Win 0 -