R中的多维数组乘法



我想执行一些复杂的多维数组乘法,其中我在数组的特定空白上相乘。

考虑这个例子,我有一个分组特征(a和B)的流行率,在人口的一些边缘:

# setup data
random=runif(4)
group.prevalence <- aperm (array(c(random,1-random),
                  dim=c(2,2,2), 
                  dimnames=list(age=c("young","old"),
                                gender=c("male","female"),
                                group=c("A","B"))) , c(3,1,2) )
group.prevalence 
# A + B = 1

假设现在我有一个感兴趣的总体…

population <- round(array(runif(4, min=100,max=200) %o% c(1,1*(1+random[1]),1*(1+random[1])^2), 
                          dim=c(2,2,3), dimnames=list(age=c("young","old"),
                                                      gender=c("male","female"),
                                                      year=c("year1","year2","year3"))))
population
我想计算"A"one_answers"B"的流行率。

糟糕的解决方案是在一个循环中填充它们:

# bad solution
grouped.population <- array(NA, dim=c(2,2,2,3), 
                            dimnames=list(group=c("A","B"),
                                          age=c("young","old"),
                                          gender=c("male","female"),
                                          year=c("year1","year2","year3")))
for (group in c("A","B"))
  for(gender in c("male","female"))
    for (age in c("young","old")) 
      grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]

但是我想某种应用可能会派上用场,可能是plyr的应用,因为结果的维度应该被保留。我试过了:

library(plyr)
aaply(population, c(1,2), function(x) x * group.prevalence)
# too many dimensions

我欢迎任何建议。

对于您的特殊情况,您可以计算:

out <- rep(group.prevalence, times=last(dim(population))) * 
       rep(population, each=first(dim(group.prevalence)))

然后你可以设置这个array的尺寸:

array(out, dim=c(2,2,2,3), 
      dimnames=list(group=c("A","B"),
                    age=c("young","old"),
                    gender=c("male","female"),
                    year=c("year1","year2","year3")))

关键是要对齐两个数组的维度,通过调换维度和扩展/复制来填充另一个数组中缺失的维度。一般来说,过程是:

  1. 确定相交尺寸。这里是(age,gender)
  2. 对于乘法group.prevalence的左边参数,排列维度(使用aperm),使所有不相交的维度(即group)在前面。然后,复制该数组N次(使用times),其中N是右侧参数population的非相交维度(即year)的大小。
  3. 对于乘法右边的参数population,排列维度,使所有不相交的维度(即year)排在最后。然后,复制数组M的每个元素多次(使用each),其中M是左侧参数group.prevalence的非相交维度(即group)的大小。
  4. 然后(数组)相乘,这是矢量化和快速。
  5. 结果的关节维数就是左边参数的不相交维数,然后是相交维数,然后是右边参数的不相交维数(即(group, age, gender, year))。然后,您可以根据需要在输出中排列这些维度以获得您想要的结果。

作为检查:

# bad solution
grouped.population <- array(NA, dim=c(2,2,2,3), 
                            dimnames=list(group=c("A","B"),
                                          age=c("young","old"),
                                          gender=c("male","female"),
                                          year=c("year1","year2","year3")))
for (group in c("A","B"))
  for(gender in c("male","female"))
    for (age in c("young","old")) 
      grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]
# another approach
grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * 
                             rep(population, each=first(dim(group.prevalence))), 
                             dim=c(2,2,2,3), 
                             dimnames=list(group=c("A","B"),
                                           age=c("young","old"),
                                           gender=c("male","female"),
                                           year=c("year1","year2","year3")))
# check
all.equal(grouped.population,grouped.population2)
##[1] TRUE

更新基准:

library(microbenchmark)
f1 <- function(group.prevalence, population) {
  grouped.population <- array(NA, dim=c(2,2,2,3), 
                              dimnames=list(group=c("A","B"),
                                            age=c("young","old"),
                                            gender=c("male","female"),
                                            year=c("year1","year2","year3")))
  for (group in c("A","B")) {
    for(gender in c("male","female")) {
      for (age in c("young","old")) {
        grouped.population[group,age,gender,] <- group.prevalence[group,age,gender] * population[age,gender,]}}}
}
f2 <- function(group.prevalence, population) {
  grouped.population2 <- array(rep(group.prevalence, times=last(dim(population))) * 
                               rep(population, each=first(dim(group.prevalence))), 
                               dim=c(2,2,2,3), 
                               dimnames=list(group=c("A","B"),
                                             age=c("young","old"),
                                             gender=c("male","female"),
                                             year=c("year1","year2","year3")))
}
print(microbenchmark(f1(group.prevalence, population)))
##Unit: microseconds
##                             expr     min      lq     mean   median      uq     max neval
## f1(group.prevalence, population) 101.473 103.998 149.2562 106.8865 115.372 1185.32   100
print(microbenchmark(f2(group.prevalence, population)))
##Unit: microseconds
##                             expr    min     lq     mean median      uq     max neval
## f2(group.prevalence, population) 66.392 67.672 70.19873 68.454 69.4205 173.284   100

我相信随着维度的数量和每个维度的大小的增加,性能的差异会更大。

相关内容

  • 没有找到相关文章