Sklearn.mixture.dpgmm功能不正常



我在使用sklearn.mixture.dpgmm时遇到了问题。主要问题是它没有为合成数据(2个分离的2D高斯)返回正确的协变,而这应该没有问题。尤其是当我做dpgmm时_get_covars(),无论输入数据分布如何,协方差矩阵的对角元素总是刚好大1.0。这似乎是一个错误,因为gmm工作得很好(当限制到已知的确切组数时)

另一个问题是dpgmm.weights _没有意义,它们加起来是一,但值看起来毫无意义。

有人能解决这个问题吗?或者我的例子中有明显的错误吗?

这是我正在运行的确切脚本:

import itertools
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
import pdb
from sklearn import mixture
# Generate 2D random sample, two gaussians each with 10000 points
rsamp1 =     np.random.multivariate_normal(np.array([5.0,5.0]),np.array([[1.0,-0.2],[-0.2,1.0]]),10000)
rsamp2 = np.random.multivariate_normal(np.array([0.0,0.0]),np.array([[0.2,-0.0],[-0.0,3.0]]),10000)
X = np.concatenate((rsamp1,rsamp2),axis=0)
# Fit a mixture of Gaussians with EM using 2
gmm = mixture.GMM(n_components=2, covariance_type='full',n_iter=10000)
gmm.fit(X)
# Fit a Dirichlet process mixture of Gaussians using 10 components
dpgmm = mixture.DPGMM(n_components=10, covariance_type='full',min_covar=0.5,tol=0.00001,n_iter = 1000000)
dpgmm.fit(X)
print("Groups With data in them")
print(np.unique(dpgmm.predict(X)))
##print the input and output covars as example, should be very similar
correct_c0 = np.array([[1.0,-0.2],[-0.2,1.0]])
print "Input covar"
print correct_c0
covars = dpgmm._get_covars()
c0 = np.round(covars[0],decimals=1)
print "Output Covar"
print c0
print("Output Variances Too Big by 1.0")

根据dpgmm文档,该类在0.18版本中被弃用,并将在0.20版本中删除

您应该使用BayesianHaussianMixture类,参数weight_concentration_prior_type设置为选项"dirichlet_process"

希望它能帮助

而不是写入

from sklearn.mixture import GMM
gmm = GMM(2, covariance_type='full', random_state=0)

你应该写:

from sklearn.mixture import BayesianGaussianMixture
gmm = BayesianGaussianMixture(2, covariance_type='full', random_state=0)

相关内容

  • 没有找到相关文章

最新更新