R比较列表并识别单个更改



这是一个用R编写的冰淇淋店数据框架,它显示了该店每个月的口味。

df <- data.frame(date = as.Date(c(rep("2022-01-01", 3), 
rep("2022-02-01", 3),
rep("2022-03-01", 4))),
flavor = c("Almond", "Apple", "Apricot", 
"Almond", "Maple", "Mint",
"Almond", "Maple", "Mint", "Pumpkin"))
#>         date  flavor
#> 1  2022-01-01  Almond
#> 2  2022-01-01   Apple
#> 3  2022-01-01 Apricot
#> 4  2022-02-01  Almond
#> 5  2022-02-01   Maple
#> 6  2022-02-01    Mint
#> 7  2022-03-01  Almond
#> 8  2022-03-01   Maple
#> 9  2022-03-01    Mint
#> 10 2022-03-01 Pumpkin

我创建了一个脚本来显示任何特定月份添加的冰淇淋口味。你也可以看到删除的味道,作为一个列表。请注意,3月份没有去掉任何口味(2022-03-01,flavors.removed等于<chr [0]>)。

library(dplyr)
library(tidyr)
df %>% 
group_by(date) %>% 
summarize(flavors = list(flavor)) %>% 
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
ungroup %>% 
select(-flavors) %>% 
unnest_longer(flavors.added)
#> # A tibble: 6 x 3
#>   date       flavors.added flavors.removed
#>   <date>     <chr>         <list>         
#> 1 2022-01-01 Almond        <NULL>         
#> 2 2022-01-01 Apple         <NULL>         
#> 3 2022-01-01 Apricot       <NULL>         
#> 4 2022-02-01 Maple         <chr [2]>      
#> 5 2022-02-01 Mint          <chr [2]>      
#> 6 2022-03-01 Pumpkin       <chr [0]>  

当我试图通过调用unnest_longer(flavors.removed)来捕获有关风味的信息时,我最终无意中过滤掉了2022-03-01的所有信息,因为flavors.removed列表在2022-03-01时间段内是空的(<chr [0]>)。

library(dplyr)
library(tidyr)
df %>% 
group_by(date) %>% 
summarize(flavors = list(flavor)) %>% 
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
ungroup %>% 
select(-flavors) %>% 
unnest_longer(flavors.added) %>% 
unnest_longer(flavors.removed) %>% 
pivot_longer(-date, names_to = "type", values_to = "flavor") %>% 
arrange(date, type) %>% 
unique()
#> # A tibble: 8 x 3
#>   date       type            flavor 
#>   <date>     <chr>           <chr>  
#> 1 2022-01-01 flavors.added   Almond 
#> 2 2022-01-01 flavors.added   Apple  
#> 3 2022-01-01 flavors.added   Apricot
#> 4 2022-01-01 flavors.removed NA     
#> 5 2022-02-01 flavors.added   Maple  
#> 6 2022-02-01 flavors.added   Mint   
#> 7 2022-02-01 flavors.removed Apple  
#> 8 2022-02-01 flavors.removed Apricot

有没有更好的方法来单独识别每个月添加和删除的口味?我需要重新获得如下所示的第九行,它使用我的错误方法过滤掉了。

#> # A tibble: 9 x 3
#>   date       type            flavor 
#>   <date>     <chr>           <chr>  
#> 1 2022-01-01 flavors.added   Almond 
#> 2 2022-01-01 flavors.added   Apple  
#> 3 2022-01-01 flavors.added   Apricot
#> 4 2022-01-01 flavors.removed NA     
#> 5 2022-02-01 flavors.added   Maple  
#> 6 2022-02-01 flavors.added   Mint   
#> 7 2022-02-01 flavors.removed Apple  
#> 8 2022-02-01 flavors.removed Apricot
#> 9 2022-03-01 flavors.added   Pumpkin

如果您不需要第4行上的NA,则可以使用

df %>% 
group_by(date) %>% 
summarize(flavors = list(flavor)) %>% 
mutate(flavors.added = mapply(setdiff, flavors, lag(flavors)),
flavors.removed = mapply(setdiff, lag(flavors), flavors)) %>% 
ungroup %>% 
select(-flavors) %>% 
pivot_longer(-date, names_to = "type", values_to = "flavor") %>% 
unnest(flavor)
# A tibble: 8 × 3
date       type            flavor 
<date>     <chr>           <chr>  
1 2022-01-01 flavors.added   Almond 
2 2022-01-01 flavors.added   Apple  
3 2022-01-01 flavors.added   Apricot
4 2022-02-01 flavors.added   Maple  
5 2022-02-01 flavors.added   Mint   
6 2022-02-01 flavors.removed Apple  
7 2022-02-01 flavors.removed Apricot
8 2022-03-01 flavors.added   Pumpkin

我发现单独找到添加和删除的味道更直接,然后如果需要,最后将它们结合在一起。

在这种情况下,您可以使用tidyr::unchop(keep_empty = TRUE)来避免删除空行。

library(tidyverse)
df <- tibble(
date = as.Date(c(
rep("2022-01-01", 3), 
rep("2022-02-01", 3),
rep("2022-03-01", 4)
)),
flavor = c(
"Almond", "Apple", "Apricot", 
"Almond", "Maple", "Mint",
"Almond", "Maple", "Mint", "Pumpkin"
)
)
df
#> # A tibble: 10 × 2
#>    date       flavor 
#>    <date>     <chr>  
#>  1 2022-01-01 Almond 
#>  2 2022-01-01 Apple  
#>  3 2022-01-01 Apricot
#>  4 2022-02-01 Almond 
#>  5 2022-02-01 Maple  
#>  6 2022-02-01 Mint   
#>  7 2022-03-01 Almond 
#>  8 2022-03-01 Maple  
#>  9 2022-03-01 Mint   
#> 10 2022-03-01 Pumpkin
flavors <- df %>% 
group_by(date) %>% 
summarize(flavors = list(flavor)) %>% 
ungroup()
flavors
#> # A tibble: 3 × 2
#>   date       flavors  
#>   <date>     <list>   
#> 1 2022-01-01 <chr [3]>
#> 2 2022-02-01 <chr [3]>
#> 3 2022-03-01 <chr [4]>
# Find added flavors
added <- flavors %>%
mutate(added = mapply(setdiff, flavors, lag(flavors)), .keep = "unused") %>% 
unchop(added, keep_empty = TRUE) %>%
pivot_longer(added, names_to = "type", values_to = "flavor")
# Find removed flavors
removed <- flavors %>%
mutate(removed = mapply(setdiff, lag(flavors), flavors), .keep = "unused") %>% 
unchop(removed, keep_empty = TRUE) %>%
pivot_longer(removed, names_to = "type", values_to = "flavor")
added
#> # A tibble: 6 × 3
#>   date       type  flavor 
#>   <date>     <chr> <chr>  
#> 1 2022-01-01 added Almond 
#> 2 2022-01-01 added Apple  
#> 3 2022-01-01 added Apricot
#> 4 2022-02-01 added Maple  
#> 5 2022-02-01 added Mint   
#> 6 2022-03-01 added Pumpkin
removed
#> # A tibble: 4 × 3
#>   date       type    flavor 
#>   <date>     <chr>   <chr>  
#> 1 2022-01-01 removed <NA>   
#> 2 2022-02-01 removed Apple  
#> 3 2022-02-01 removed Apricot
#> 4 2022-03-01 removed <NA>
bind_rows(added, removed) %>%
arrange(date, type)
#> # A tibble: 10 × 3
#>    date       type    flavor 
#>    <date>     <chr>   <chr>  
#>  1 2022-01-01 added   Almond 
#>  2 2022-01-01 added   Apple  
#>  3 2022-01-01 added   Apricot
#>  4 2022-01-01 removed <NA>   
#>  5 2022-02-01 added   Maple  
#>  6 2022-02-01 added   Mint   
#>  7 2022-02-01 removed Apple  
#>  8 2022-02-01 removed Apricot
#>  9 2022-03-01 added   Pumpkin
#> 10 2022-03-01 removed <NA>

由reprex包(v2.0.1)创建于2022-06-02

最新更新