在下面的例子中,我有一个函数(meansd
(,它返回向量的平均值,也可以选择返回标准偏差。当组合[.data.table
时,j
和by
中的.
/list
语法是理想的结果,如dcast
结果所示,该结果在度量(例如,函数返回的均值和sd(上较长,但在度量上较宽。
是否可以直接从类似于a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]
的对[.data.table
的调用中获得dcast
调用的结果(因此跳过dcast步骤(?还是上一个例子中的错误(组合转置和by(意味着我运气不好?
library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))
}
#Presumable syntax:
a[, .(meansd(var1, T), meansd(var1, T)), by = bvar] #data are long on metric which is not ideal
#> bvar V1 V2
#> 1: 1 32.8 32.8
#> 2: 1 51.8575 51.8575
#> 3: 2 6 6
#> 4: 2 4.690416 4.690416
#What I was hoping the output would be (dcast call results):
ideal = a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]
ideal[, metric := rep(c('mean', 'sd'),2)]
#ideally the output would look something like this, without the need for the dcast:
dcast(ideal,bvar~metric, value.var = c('V1','V2')) #names are nice, but unimportant
#> bvar V1_mean V1_sd V2_mean V2_sd
#> 1: 1 32.8 51.8575 32.8 51.8575
#> 2: 2 6 4.690416 6 4.690416
我尝试过一些相关的东西:
#some success when only operating on one column:
a[, .(meansd(var1, F))] #returned vector becomes a data.table per `.()` syntax
#> V1
#> 1: 19.4
a[, .(meansd(var1, T))] #why is this long? Is it possible to be wide? Why does the inclusion of `.()` change it?
#> V1
#> 1: 19.4
#> 2: 37.47651
a[, .(t(meansd(var1, T)))] #transpose works alright
#> mean sd
#> 1: 19.4 37.47651
a[, .(t(meansd(var1, T))), by = bvar] #by does not like a transpose
#> Error in `[.data.table`(a, , .(t(meansd(var1, T))), by = bvar): All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
如果将对meansd
的两个调用连接到一个向量中,然后对其进行转置,则会得到4个独立的列。
然后你只需要通过重命名你的列来清理:
library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))
}
a = a[, t(c(meansd(var1, TRUE), meansd(var1, TRUE))), by=bvar]
colnames(a) = c("bvar", "V1_mean", "V1_sd", "V2_mean", "V2_sd")
a
#> bvar V1_mean V1_sd V2_mean V2_sd
#> 1: 1 32.8 51.857497 32.8 51.857497
#> 2: 2 6.0 4.690416 6.0 4.690416