计算平均值而不考虑最大值



我有以下数据框df

ARTNR= 货号(有多余的货号(

ARTNR   AMOUNT
20      10
12      10
12      10
20      10
12      100
20      200
...     ...       

我想创建数据框df_delta

sum_1= 每个ARTNRAMOUNT总和(我想有一个货号,没有冗余(

sum_minus_max=sum_1-ARTNR的最大值AMOUNT

average=sum_minus_max/n - 1,其中 n 是ARTNR

delta=average-ARTNRAMOUNT最大值

ARTNR   sum_1      sum_minus_max   average   delta
20       220        20              10        -190
12       120        20              10        -90
...      ...        ...             ...       ...

有人可以帮我吗?我将不胜感激!

非常感谢!

您可以使用如下aggregate

newDataFrameName <- do.call(cbind, aggregate(AMOUNT ~ ARTNR, df, function(x) {
sumx <- sum(x)
maxx <- max(x)
meanx <- mean(x[x!=maxx])
c(sum_1=sumx, sum_minus_max=sum(x[x!=maxx]), average=meanx, delta=meanx-maxx)}))
newDataFrameName
#    ARTNR sum_1 sum_minus_max average delta
#[1,]    12   120            20      10   -90
#[2,]    20   220            20      10  -190

您可以使用 dplyr 像这样操作数据:

library(dplyr)
df <- data.frame(ARTNR = c(20,12,12,20,12,20), 
AMOUNT = c(10,10,10,10,100,200))
df %>% group_by(ARTNR) %>% summarize(sum_1 = sum(AMOUNT), sum_minus_max = sum(AMOUNT) - max(AMOUNT), 
average = (sum(AMOUNT) - max(AMOUNT))/(n()-1), 
delta =  (sum(AMOUNT) - max(AMOUNT))/(n()-1) - max(AMOUNT))

这给出了:

# A tibble: 2 x 5
ARTNR sum_1 sum_minus_max average delta
<dbl> <dbl>         <dbl>   <dbl> <dbl>
1    12   120            20      10   -90
2    20   220            20      10  -190

请尝试以下脚本:

library(dplyr)
remove_max <- function(vector){ 
# Avoids remove vector with only 1 element
if(length(vector) == 1) return(vector)
indx <- which(vector == max(vector))
vector[-indx]
}
data %>%
group_by(ARTNR) %>%
summarize(
sum_1 = sum(AMOUNT),
sum_minus_max = sum_1 - max(AMOUNT),
average = mean(remove_max(AMOUNT)),
delta = average - max(AMOUNT)
)

希望这能帮助你。

最新更新