使用 summarise_all [R] 在 dplyr 组中执行 t 检验



假设我想比较每个国家/地区以两种不同货币的苹果和橙子的价格:美国和BTC。

美国 ~ 每个国家的水果 BTC ~ 每个国家的
水果

library(tidyverse)
prices <- tibble(
country = c(rep("USA", 6), rep("Spain", 6), rep("Korea", 6)),
fruit = rep(c("apples", "apples", "apples", "oranges", "oranges", "oranges"), 3),
price_USA = rnorm(18),
price_BTC = rnorm(18)
)
prices %>% 
group_by(country) %>% 
summarise(
pval_USA = t.test(price_USA ~ fruit)$p.value
pval_BTC = t.test(price_BTC ~ fruit)$p.value
)

现在假设有很多列,我想使用summarise_all而不是命名每一列。有没有办法使用dplyr::summarise_all函数在每个组(country(和每列(price_USAprice_BTC(上执行t检验?到目前为止,我尝试过的方法一直给我带来错误。

prices %>% 
group_by(country) %>% 
summarise_at(
c("price_USA", "price_BTC"),
function(x) {t.test(x ~ .$fruit)$p.value}
)
> Error in model.frame.default(formula = x ~ .$fruit) : 
variable lengths differ (found for '.$fruit') 

您可以通过将数据从宽格式重塑为长格式来实现此目的。以下是使用 dplyr 的解决方案:

library(tidyverse)
prices <- tibble(
country = c(rep("USA", 6), rep("Spain", 6), rep("Korea", 6)),
fruit = rep(c("apples", "apples", "apples", "oranges", "oranges", "oranges"), 3),
price_USA = rnorm(18),
price_BTC = rnorm(18)
)
prices %>% 
pivot_longer(cols = starts_with("price"), names_to = "name",
values_to = "price", names_prefix = "price_") %>%
group_by(country, name) %>%
summarise(pval = t.test(price ~ fruit)$p.value)
#> # A tibble: 6 x 3
#> # Groups:   country [3]
#>   country name   pval
#>   <chr>   <chr> <dbl>
#> 1 Korea   BTC   0.458
#> 2 Korea   USA   0.721
#> 3 Spain   BTC   0.732
#> 4 Spain   USA   0.526
#> 5 USA     BTC   0.916
#> 6 USA     USA   0.679

最新更新