替换 R 中按组分隔的类别的中位数



在我的数据集中

mydat=structure(list(code = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("25480МСК", "25481МСК"), class = "factor"), 
item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(1L, 
2L, 15L, 1L, 4L, 3L, 3L, 3L, 15L, 4L, 4L, 4L), action = c(0L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), .Names = c("code", 
"item", "sales", "action"), class = "data.frame", row.names = c(NA, 
-12L))

我有 2 组变量代码 + 项目。这里有两组:

25481МСК    13163
25480МСК    13164

我也有操作列。它只能有两个值 0 (0( 或 one(1(。 我需要通过 action=0 计算销售额的中位数,然后用这个中位数用动作替换所有 1(1(。 必须为每个组单独完成。

即期望的输出

code    item    sales   action  output
25481МСК    13163   1   0        1
25481МСК    13163   2   0        2
25481МСК    13163   15  1        2
25481МСК    13163   1   0        1
25481МСК    13163   4   0        4
25481МСК    13163   3   0        3
25480МСК    13164   3   0        3
25480МСК    13164   3   0        3
25480МСК    13164   15  1        4
25480МСК    13164   4   0        4
25480МСК    13164   4   0        4
25480МСК    13164   4   0        4

25481МСК 13163组中销售操作的中位数为零 = 2,操作 1=15,因此我们将操作 1=15 替换为 2。

请注意,action=0 的销售列值也应位于输出列中。 如何执行?

librar(dplyr)
mydat %>% group_by(code,item) %>% 
mutate(output=ifelse(action==0,sales,median(sales[action==0],na.rm = TRUE))) 

# A tibble: 12 x 5
# Groups:   code, item [2]
code      item sales action output
<fct>    <int> <int>  <int>  <int>
1 25481МСК 13163     1      0      1
2 25481МСК 13163     2      0      2
3 25481МСК 13163    15      1      2
4 25481МСК 13163     1      0      1
5 25481МСК 13163     4      0      4
6 25481МСК 13163     3      0      3
7 25480МСК 13164     3      0      3
8 25480МСК 13164     3      0      3
9 25480МСК 13164    15      1      4
10 25480МСК 13164     4      0      4
11 25480МСК 13164     4      0      4
12 25480МСК 13164     4      0      4
library(data.table)
setDT(mydat)
mydat[, 
output := ifelse(action, median(sales[!action]), sales), 
by = .(code, item)]
code  item sales action output
1: 25481MCK 13163     1      0      1
2: 25481MCK 13163     2      0      2
3: 25481MCK 13163    15      1      2
4: 25481MCK 13163     1      0      1
5: 25481MCK 13163     4      0      4
6: 25481MCK 13163     3      0      3
7: 25480MCK 13164     3      0      3
8: 25480MCK 13164     3      0      3
9: 25480MCK 13164    15      1      4
10: 25480MCK 13164     4      0      4
11: 25480MCK 13164     4      0      4
12: 25480MCK 13164     4      0      4

为了完整起见,以下是另一种使用更新连接的方法:

library(data.table)
# compute medians for each group
med <- setDT(mydat)[action == 0L, median(sales), by = .(code, item)][
# append column to pick only rows with action == 1L in join
, action := 1L]
mydat[
# copy sales to output column, thereby coercing to double to match value of median()
, output := as.numeric(sales)][
# join and update selectively
med, on = .(code, item, action), output := V1]
mydat[]
code  item sales action output
1: 25481MCK 13163     1      0      1
2: 25481MCK 13163     2      0      2
3: 25481MCK 13163    15      1      2
4: 25481MCK 13163     1      0      1
5: 25481MCK 13163     4      0      4
6: 25481MCK 13163     3      0      3
7: 25480MCK 13164     3      0      3
8: 25480MCK 13164     3      0      3
9: 25480MCK 13164    15      1      4
10: 25480MCK 13164     4      0      4
11: 25480MCK 13164     4      0      4
12: 25480MCK 13164     4      0      4

最新更新