将pvcluster R函数应用于预先计算的dist对象

我使用R来执行分层集群。作为第一种方法，我使用了hclust并执行了以下步骤：

我导入了距离矩阵
我使用as.dist函数将其转换为dist对象
我在dist对象上运行hclust

这是R代码：

distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
hclust(d, "ward")

在这一点上，我想对函数pvclust做一些类似的事情；但是，我不能，因为不可能传递预先计算的dist对象。考虑到我正在使用R的dist函数提供的距离中不可用的距离，我如何继续？

我已经测试了Vincent的建议，您可以执行以下操作（我的数据集是一个相异矩阵）：

# Import you data
distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
# Compute the eigenvalues
x <- cmdscale(d,1,eig=T)
# Plot the eigenvalues and choose the correct number of dimensions (eigenvalues close to 0)
plot(x$eig, 
   type="h", lwd=5, las=1, 
   xlab="Number of dimensions", 
   ylab="Eigenvalues")
# Recover the coordinates that give the same distance matrix with the correct number of dimensions    
x <- cmdscale(d,nb_dimensions)
# As mentioned by Stéphane, pvclust() clusters columns
pvclust(t(x))

如果数据集不是太大，可以将n个点嵌入到维度为n-1的空间中，并使用相同的距离矩阵。

# Sample distance matrix
n <- 100
k <- 1000
d <- dist( matrix( rnorm(k*n), nc=k ), method="manhattan" )
# Recover some coordinates that give the same distance matrix
x <- cmdscale(d, n-1)
stopifnot( sum(abs(dist(x) - d)) < 1e-6 )
# You can then indifferently use x or d
r1 <- hclust(d)
r2 <- hclust(dist(x)) # identical to r1
library(pvclust)
r3 <- pvclust(x)

如果数据集很大，您可能需要检查pvclust是如何实现的。

我不清楚你是只有一个距离矩阵，还是事先计算的。在前一种情况下，正如@Vincent已经建议的那样，调整pvclust本身的R代码并不太困难（使用fix()或其他什么；我在CrossValidated上提供了另一个问题的一些提示）。在后一种情况下，pvcluster的作者提供了一个关于如何使用自定义距离函数的示例，尽管这意味着你必须安装他们的"非官方版本"。

相关内容

最新更新

热门标签：