dplyr 1.0.2中的r-summary()类似于mutate()



给定一个列出用户、产品和产品功能的tibble,我试图计算具有特定产品功能的不同产品用户的比例:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- tribble(
~users, ~product, ~feature,
"bob","iPhone","screen",
"bob","iPhone","camera",
"bob","iPhone","facial recognition",
"sally","Android","screen",
"sally","Android","camera",
"sally","Android","facial recognition",
"joe","Huawei","screen",
"joe","Huawei","camera",
"joe","Huawei","facial recognition",
"rachel","iPhone","screen",
"rachel","iPhone","camera",
"rachel","iPhone","fingerprint sensor"
)
# Get count of distinct users by product
df <- df %>%
group_by(product) %>%
mutate(n_users = n_distinct(users)) %>%
ungroup()
df
#> # A tibble: 12 x 4
#>    users  product feature            n_users
#>    <chr>  <chr>   <chr>                <int>
#>  1 bob    iPhone  screen                   2
#>  2 bob    iPhone  camera                   2
#>  3 bob    iPhone  facial recognition       2
#>  4 sally  Android screen                   1
#>  5 sally  Android camera                   1
#>  6 sally  Android facial recognition       1
#>  7 joe    Huawei  screen                   1
#>  8 joe    Huawei  camera                   1
#>  9 joe    Huawei  facial recognition       1
#> 10 rachel iPhone  screen                   2
#> 11 rachel iPhone  camera                   2
#> 12 rachel iPhone  fingerprint sensor       2
# Count the fraction of distinct users with given product feature
df <- df %>%
group_by(product, feature) %>%
summarise(feature_fraction = n()/n_users,
.groups = "drop_last")
df
#> # A tibble: 12 x 3
#> # Groups:   product [3]
#>    product feature            feature_fraction
#>    <chr>   <chr>                         <dbl>
#>  1 Android camera                          1  
#>  2 Android facial recognition              1  
#>  3 Android screen                          1  
#>  4 Huawei  camera                          1  
#>  5 Huawei  facial recognition              1  
#>  6 Huawei  screen                          1  
#>  7 iPhone  camera                          1  
#>  8 iPhone  camera                          1  
#>  9 iPhone  facial recognition              0.5
#> 10 iPhone  fingerprint sensor              0.5
#> 11 iPhone  screen                          1  
#> 12 iPhone  screen                          1
Created on 2020-10-23 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2020-10-23                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  backports     1.1.10  2020-09-15 [1] CRAN (R 4.0.2)
#>  callr         3.4.4   2020-09-07 [1] CRAN (R 4.0.2)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.2)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.2)
#>  devtools      2.3.1   2020-07-21 [1] CRAN (R 4.0.2)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.2)
#>  dplyr       * 1.0.2   2020-08-18 [1] CRAN (R 4.0.2)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.2)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.2)
#>  knitr         1.29    2020-06-23 [1] CRAN (R 4.0.2)
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.2)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.2)
#>  pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.2)
#>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.2)
#>  processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)
#>  ps            1.3.4   2020-08-11 [1] CRAN (R 4.0.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.2)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang         0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.2)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 4.0.2)
#>  tibble        3.0.3   2020-07-10 [1] CRAN (R 4.0.2)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 4.0.2)
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 4.0.2)
#>  vctrs         0.3.4   2020-08-29 [1] CRAN (R 4.0.2)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)
#>  xfun          0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)

可以看出,对于具有相同汇总值的组密钥对,最终的tibble具有多行。据我所知,这是summarise的意外行为,似乎与mutate返回的行为几乎相同。考虑到这个开放的github问题,新版本的summarise似乎还没有解决所有的问题。我也可能只是愚蠢,如果有人能帮我回到正轨,我将不胜感激!

问题是每个组都有多个n_users值。如果汇总函数返回多个值,则最新版本的dplyr允许每个组返回多行。

如果你想假设每个组的n_users的所有值都是相同的,那么你可以进行

df %>%
group_by(product, feature) %>%
summarise(feature_fraction = n()/first(n_users),
.groups = "drop_last")

这将确保每个组只返回一个值

最新更新