r-使用聚合时返回data.table中的多个列.()/列出并通过

  • 本文关键字:返回 table data r data.table
  • 更新时间 :
  • 英文 :


在下面的例子中,我有一个函数(meansd(,它返回向量的平均值,也可以选择返回标准偏差。当组合[.data.table时,jby中的./list语法是理想的结果,如dcast结果所示,该结果在度量(例如,函数返回的均值和sd(上较长,但在度量上较宽。

是否可以直接从类似于a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]的对[.data.table的调用中获得dcast调用的结果(因此跳过dcast步骤(?还是上一个例子中的错误(组合转置和by(意味着我运气不好?

library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))

}
#Presumable syntax:
a[, .(meansd(var1, T), meansd(var1, T)), by = bvar] #data are long on metric which is not ideal
#>    bvar       V1       V2
#> 1:    1     32.8     32.8
#> 2:    1  51.8575  51.8575
#> 3:    2        6        6
#> 4:    2 4.690416 4.690416
#What I was hoping the output would be (dcast call results):
ideal = a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]
ideal[, metric := rep(c('mean', 'sd'),2)]
#ideally the output would look something like this, without the need for the dcast:
dcast(ideal,bvar~metric, value.var = c('V1','V2')) #names are nice, but unimportant
#>    bvar V1_mean    V1_sd V2_mean    V2_sd
#> 1:    1    32.8  51.8575    32.8  51.8575
#> 2:    2       6 4.690416       6 4.690416

我尝试过一些相关的东西:

#some success when only operating on one column:
a[, .(meansd(var1, F))] #returned vector becomes a data.table per `.()` syntax
#>      V1
#> 1: 19.4
a[, .(meansd(var1, T))] #why is this long? Is it possible to be wide? Why does the inclusion of `.()` change it?
#>          V1
#> 1:     19.4
#> 2: 37.47651
a[, .(t(meansd(var1, T)))] #transpose works alright
#>    mean       sd
#> 1: 19.4 37.47651
a[, .(t(meansd(var1, T))), by = bvar] #by does not like a transpose
#> Error in `[.data.table`(a, , .(t(meansd(var1, T))), by = bvar): All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

如果将对meansd的两个调用连接到一个向量中,然后对其进行转置,则会得到4个独立的列。

然后你只需要通过重命名你的列来清理:

library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))
}
a = a[, t(c(meansd(var1, TRUE), meansd(var1, TRUE))), by=bvar]
colnames(a) = c("bvar", "V1_mean", "V1_sd", "V2_mean", "V2_sd")
a
#>    bvar V1_mean     V1_sd V2_mean     V2_sd
#> 1:    1    32.8 51.857497    32.8 51.857497
#> 2:    2     6.0  4.690416     6.0  4.690416

最新更新