新的。当然为什么。每个面糊的Rate1相同。一旦我获得每个击球手的Rate1,我都需要一个卑鄙的卑鄙和Stdev,但我还没有达到这一点...
这是数据框的子集...
BAT_ID DP_FL
2 hanim001 FALSE
18 hereg002 FALSE
40 pujoa001 TRUE
50 espid001 TRUE
97 troum001 FALSE
131 calhk001 FALSE
136 hanim001 FALSE
148 hanim001 FALSE
165 mottt001 FALSE
215 calhk001 TRUE
238 calhk001 FALSE
255 napom001 FALSE
264 gomec002 FALSE
267 maybc001 TRUE
271 napom001 FALSE
279 rua-r001 FALSE
283 simma001 TRUE
286 mazan001 FALSE
318 martj007 FALSE
322 choos001 TRUE
356 gomec002 FALSE
#Percent groundball double play
library(plyr)
mean1<-ddply(all_data_gnd, .(BAT_ID), summarize, Rate1=
(sum(as.numeric(which(all_data_gnd$DP_FL==1))) /
(sum(as.numeric(which(all_data_gnd$DP_FL==0))) +
sum(as.numeric(which(all_data_gnd$DP_FL==1))))))
head(mean1)
> head(mean1)
BAT_ID Rate1
1 abrej003 0.1741862
2 adamc001 0.1741862
3 adaml001 0.1741862
4 adamm002 0.1741862
5 adduj002 0.1741862
6 adlet001 0.1741862
您的数据不足数据,因此我将生成一些假数据:
n <- 1e4
set.seed(2)
fakedata <- data.frame(
bat_id = sample(letters[1:5], size=n, replace=TRUE),
dp_fl = sample(c(T,F), size=n, replace=TRUE),
stringsAsFactors = FALSE
)
head(fakedata)
# bat_id dp_fl
# 1 a TRUE
# 2 d TRUE
# 3 c TRUE
# 4 a FALSE
# 5 e TRUE
# 6 e FALSE
您不需要as.numeric
,并且对==1/(==0 + ==1)
的使用实际上是逻辑的mean
。您可以通过多种总结:
stack(by(fakedata$dp_fl, fakedata$bat_id, mean))
stack(tapply(fakedata$dp_fl, fakedata$bat_id, mean))
每个导致
# values ind
# 1 0.4935000 a
# 2 0.5015322 b
# 3 0.4869432 c
# 4 0.5223735 d
# 5 0.5041810 e
呼叫colnames
将很有用。
您也可以使用:
library(dplyr)
fakedata %>%
group_by(bat_id) %>%
summarize(dp_fl = mean(dp_fl))
# # A tibble: 5 × 2
# bat_id dp_fl
# <chr> <dbl>
# 1 a 0.4935000
# 2 b 0.5015322
# 3 c 0.4869432
# 4 d 0.5223735
# 5 e 0.5041810