r语言 - 如何使用"with"和"tapply"根据多个因子计算新变量



我试图获得基于另一个变量"Period"的单个组"Actrl"的电话手柄时间" handle "的平均值"ctrlmeans"。然后,我想通过从数据框中每个人的"Handle"中减去mean来创建一个新变量"Difference"。

我是这样做的:

> ttp1<-read.csv("ttp1.csv")
> dput(head(ttp1,12))
structure(list(NUID = structure(c(4L, 6L, 7L, 8L, 11L, 12L, 9L, 
10L, 1L, 2L, 3L, 5L), .Label = c("A000904", "A024324", "A047744", 
"A063828", "A071164", "C833344", "C833345", "C833346", "E254607", 
"Y950092", "Z952754", "Z993876"), class = "factor"), Period = c(201415L, 
201415L, 201415L, 201415L, 201415L, 201415L, 201416L, 201416L, 
201416L, 201416L, 201416L, 201416L), Queue = c(1L, 2L, 1L, 1L, 
2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), Group = structure(c(2L, 4L, 
3L, 3L, 3L, 3L, 1L, 4L, 3L, 3L, 3L, 3L), .Label = c("A", "A ", 
"ACTRL", "B"), class = "factor"), Handle = c(1013L, 699L, 425L, 
450L, 444L, 681L, 532L, 716L, 388L, 307L, 430L, 380L)), .Names = c("NUID", 
"Period", "Queue", "Group", "Handle"), row.names = c(NA, 12L), class = "data.frame")

我的命令:

> ctrlmeans <- with(subset(ttp1, Group=="ACTRL"), tapply(Handle, Period, mean))
> ctrlmeans

201415 201416 
500.00 376.25 
> Difference <- ttp1$Handle-ctrlmeans[ttp1$Period]
> Difference

<NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 
  NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA 

为什么我会得到NA?

如果我在tapply命令"queue"中包含一个额外的分组变量,我将如何做到这一点?

如果您想通过PeriodQueue组计算Handle的方法,请给您一个dplyr包如何工作的示例:

require(dplyr)
ctrlmeans <-                               #data.frame to store your results   
ttp1 %.%                                   #data.frane to use for analysis
  group_by(Period,Queue) %.%               #grouping variables (you can add/remove Queue if you like)
  filter(Group == "ACTRL") %.%             #use only rows where Group == "ACTRL"
  summarize(mean.Handle = mean(Handle))    #makes a summary column with means of Handle by group                                                                                     
ttp1 <- inner_join(ttp1,ctrlmeans,by=c("Period","Queue"))  #join the ctrlmeans to the ttp1 data frame
ttp1["Diff"] <- with(ttp1, Handle - mean.Handle)           #Add column for the differences
#>ttp1
#      NUID Period Queue Group Handle mean.Handle   Diff
#1  A063828 201415     1    A    1013       437.5  575.5
#2  C833345 201415     1 ACTRL    425       437.5  -12.5
#3  C833346 201415     1 ACTRL    450       437.5   12.5
#4  C833344 201415     2     B    699       562.5  136.5
#5  Z952754 201415     2 ACTRL    444       562.5 -118.5
#6  Z993876 201415     2 ACTRL    681       562.5  118.5
#7  E254607 201416     1     A    532       347.5  184.5
#8  A000904 201416     1 ACTRL    388       347.5   40.5
#9  A024324 201416     1 ACTRL    307       347.5  -40.5
#10 Y950092 201416     2     B    716       405.0  311.0
#11 A047744 201416     2 ACTRL    430       405.0   25.0
#12 A071164 201416     2 ACTRL    380       405.0  -25.0 

如果您只想按周期分组计算,只需从filter语句和inner_join语句中删除Queue

此方法仅在Period是字符或因子时有效。现在它是数值,所以你可以改变

Difference <- ttp1$Handle-ctrlmeans[as.character(ttp1$Period)]

这个方法也只适用于一个分组变量。如果有多个数据集,您可能希望对新数据集执行一些聚合,以获得组摘要,然后将其合并回更大的data.frame中,并执行所需的任何转换。或者您可以查看更高级的数据帧操作包,如plyr

最新更新