

Team               Year
012 Hortney        2017
012 Hortney        2018
013 James          2017
013 James          2018
014 Ilωus hero     2017
014 Ilωus hero     2018
015 Hortna         2017
015 Hortna         2018
016 Exclus race    2017
#with 25 000 more rows


code    name         Year
012   Hortney        2017
012   Hortney        2018
013   James          2017
013   James          2018
014   Ilωus hero     2017
014   Ilωus hero     2018
015   Hortna         2017
015   Hortna         2018
016   Exclus race    2017
#with 25 000 more rows

我尝试过这个代码separate(Team, c("code", "name")),但它让数据的名称变得奇怪(尤其是希腊字母(ω(,它之后的一切都消失了,我必须保持ω完整,以便以后编码。名称的最后一部分也会在Exclus中消失。像这样:(括号内我正在寻找(

code   name          Year
012   Hortney        2017
012   Hortney        2018
013   James          2018
014   Il             2017   (Ilωus hero)
014   Il             2018   (Ilωus hero)
015   Hortna         2017
015   Hortna         2018
016   Exclus         2017   (Exclus race)
#With 25 00 more rows




df |>
  separate(Team, into = c("code", "name"), sep = "(?<=\d) (?=\w)")


# A tibble: 9 × 3
  code  name         Year
  <chr> <chr>       <dbl>
1 012   Hortney      2017
2 012   Hortney      2018
3 013   James        2017
4 013   James        2018
5 014   Ilωus hero   2017
6 014   Ilωus hero   2018
7 015   Hortna       2017
8 015   Hortna       2018
9 016   Exclus race  2017


df <- read_csv("Team,               Year
012 Hortney,        2017
012 Hortney,        2018
013 James,          2017
013 James,          2018
014 Ilωus hero,     2017
014 Ilωus hero,     2018
015 Hortna,         2017
015 Hortna,         2018
016 Exclus race,    2017")


Teams |> 
      mutate(code = gsub("\D" , "" , Team) ,
      name = trimws(gsub("\d" , "" , Team))) |>
      select(code , name , Year)
  • 输出
  code        name Year
1  012     Hortney 2017
2  012     Hortney 2018
3  013       James 2017
4  013       James 2018
5  014  Ilωus hero 2017
6  014  Ilωus hero 2018
7  015      Hortna 2017
8  015      Hortna 2018
9  016 Exclus race 2017


Teams = data.frame(Team = c(
"012 Hortney", " 013  James ", " 018 Alain Philippe have a very long name"),  
Year = c( 2017,  2018, 2017) ) # The data for reproducible example
           Team = str_squish(Team), # Supress the unwanted space in variable Team
           code = str_extract(Team, "[0-9]*"), 
# Extract the first successive digits in the variable Team
           name = str_extract(Team, "[:alpha:]+[ ?[:alpha:]]*") ) %>%
# Extract the first successive letters of the variable Team, possibly with a space between the letters.
dplyr::select(code, name, Year)


 code                                 name Year
1  012                              Hortney 2017
2  013                                James 2018
3  018 Alain Philippe have a very long name 2017
