r语言 - 加权平均值中的误差。为什么找不到列?



我有一个数据帧,看起来像这个

df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))

我正在运行一个代码,我按地区分组,然后按"Emp"对所有"sales"列取平均值和加权平均值

Result <- df %>% group_by(Region) %>% 
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[[Emp]], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[[Emp]], na.rm = T))

然而,我得到以下错误

Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named,  : 
object 'Emp' not found

不知道我做错了什么

一个选项可以是:

library(tidyverse)
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))

df %>%
group_by(Region) %>%
summarise(across(
.cols = starts_with("Sales"),
.fns = list(w_mean = ~ weighted.mean(.x, w = Emp), mean = ~ mean(.x)), 
.names = "{.col}_{.fn}")
)
#> # A tibble: 3 x 5
#>   Region Sales18_w_mean Sales18_mean Sales19_w_mean Sales19_mean
#>   <chr>           <dbl>        <dbl>          <dbl>        <dbl>
#> 1 Africa          1140         1140           1605         1605 
#> 2 Asia           10360         8410          13435        11200 
#> 3 Europe         12224.       10872.          3407.        3938.

创建于2021-05-25由reprex包(v2.0.0(

这很有效。数据屏蔽已经发生,您不需要.data代词。

library(tidyverse)
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
Result <- df %>% group_by(Region) %>% 
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, Emp, na.rm = T),
weightedsales19 = weighted.mean(Sales19, Emp, na.rm = T))
Result
#> # A tibble: 3 x 5
#>   Region sales18 sales19 weightedsales18 weightedsales19
#>   <chr>    <dbl>   <dbl>           <dbl>           <dbl>
#> 1 Africa   1140    1605            1140            1605 
#> 2 Asia     8410   11200           10360           13435 
#> 3 Europe  10872.   3938.          12224.           3407.

创建于2021-05-25由reprex包(v2.0.0(

[[内部未加引号的Emp告诉R搜索名为Emp的字符串变量,该变量可能包含其他包含权重的变量的名称,如下所示:

df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
x = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
Emp <- 'x'
df %>% group_by(Region) %>% 
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[[Emp]], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[[Emp]], na.rm = T))
# A tibble: 3 x 5
Region sales18 sales19 weightedsales18 weightedsales19
<chr>    <dbl>   <dbl>           <dbl>           <dbl>
1 Africa   1140    1605            1140            1605 
2 Asia     8410   11200           10360           13435 
3 Europe  10872.   3938.          12224.           3407.

由于没有这种EmpR会抛出一个错误。

该怎么办?只需在[[:中引用Emp

df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))

df %>% group_by(Region) %>% 
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[['Emp']], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[['Emp']], na.rm = T))
# A tibble: 3 x 5
Region sales18 sales19 weightedsales18 weightedsales19
<chr>    <dbl>   <dbl>           <dbl>           <dbl>
1 Africa   1140    1605            1140            1605 
2 Asia     8410   11200           10360           13435 
3 Europe  10872.   3938.          12224.           3407.

最新更新