对R中每行值B的字符串行值A的连续事件发生次数进行计数



我有一个包含多场足球比赛的大数据集。现在的比赛形式很广泛,我想通过比赛来计算连胜、平局和失利以及球队。

在这种情况下,我有以下变量:

  1. Home_team:一个带有国家名称的字符串(例如:英格兰、西班牙等(
  2. Away_team:包含国家名称的字符串(例如:法国、德国等(
  3. 结果:有三个类别的字符串(HW:主场获胜,AW:客场获胜,D:平局(

举个简单的例子,我的数据看起来像这样:

Home_team <- c("Peru","France","England","Senegal", "Chile", "Colombia","France","Spain","Colombia", "Angola", "Ecuador", "France",
"Peru")
Away_team <- c("Brasil","Germany","Togo","Egypt", "Ecuador", "Argentina","Netherlands","Burkina Faso","New Zealand", "Venezuela", "Portugal", "Canada",
"United States")
Results <- c("HW","HW","AW","D","AW","HW","HW","AW","HW","D","D","HW","D")
df_example <- data.frame(Home_team,Away_team,Results)
df_example

因此,在这个例子中,发生了以下事情:

  • 秘鲁队(第1排(在与美国队的比赛中取得了1连胜,并打平了比赛
  • 法国队在与加拿大队的比赛中取得了2连胜
  • 哥伦比亚队也将进入2连胜
  • 厄瓜多尔队输给了智利队(1连败(,然后在下一场比赛中战平了葡萄牙队

我想一个更简单的方法是把所有东西都放在长格式中;获胜"损失";以及";绘制"。每次连胜停止,计数就会重新开始。但我不确定这是否是最好的方法。

总的来说,我想知道连胜(输球甚至打平(是否会对下一场比赛的结果产生影响。

如有任何帮助,我们将不胜感激。

这里有一种方法可以尝试。

首先,把数据放在长格式中。使用case_when,你可以确定每一场比赛的结果(例如,主队在"HW"比赛中获得"胜利",而客场球队获得"失败"(。

对于每个团队,可以使用rleiddata.table对条纹进行分组。每当结果发生变化时,它就会进入一个新的连胜。

然后,你可以计算出一支球队的连胜次数和结果。这将是给定连胜、球队和结果的row_number()

最后,如果需要的话,你可以把它放回更宽的形式。新列显示了主队和客场球队在当前比赛中的连胜纪录(比赛次数和结果(。

library(tidyverse)
library(data.table)
df_example %>%
mutate(Game = row_number()) %>%
pivot_longer(cols = c(Home_team, Away_team), names_to = "Location", values_to = "Team") %>%
mutate(Outcome = case_when(
Results == "HW" & Location == "Home_team" ~ "Win",
Results == "HW" & Location == "Away_team" ~ "Loss",
Results == "AW" & Location == "Home_team" ~ "Loss",
Results == "AW" & Location == "Away_team" ~ "Win",
Results == "D" ~ "Draw",
TRUE ~ NA_character_
)) %>%
group_by(Team) %>%
mutate(Change = rleid(Outcome)) %>%
group_by(Change, .add = T) %>%
mutate(Streak = row_number()) %>%
group_by(Team) %>%
mutate(Last = paste(lag(Streak, default = 0), lag(Outcome, default = "-"))) %>%
pivot_wider(id_cols = c(Game, Results), names_from = Location, values_from = c(Team, Last)) 

结果

Game Results Team_Home_team Team_Away_team Last_Home_team Last_Away_team
<int> <chr>   <chr>          <chr>          <chr>          <chr>         
1     1 HW      Peru           Brasil         0 -            0 -           
2     2 HW      France         Germany        0 -            0 -           
3     3 AW      England        Togo           0 -            0 -           
4     4 D       Senegal        Egypt          0 -            0 -           
5     5 AW      Chile          Ecuador        0 -            0 -           
6     6 HW      Colombia       Argentina      0 -            0 -           
7     7 HW      France         Netherlands    1 Win          0 -           
8     8 AW      Spain          Burkina Faso   0 -            0 -           
9     9 HW      Colombia       New Zealand    1 Win          0 -           
10    10 D       Angola         Venezuela      0 -            0 -           
11    11 D       Ecuador        Portugal       1 Win          0 -           
12    12 HW      France         Canada         2 Win          0 -           
13    13 D       Peru           United States  1 Win          0 - 

最新更新