python sklearn.mixture.gmm不能缩放

我正在使用python中的sklearn.mixture.gmm，结果似乎取决于数据缩放。在以下代码示例中，我更改了整体缩放，但我不更改维度的相对缩放。但是在三个不同的缩放设置下，我得到了完全不同的结果：

from sklearn.mixture import GMM
from numpy import array, shape
from numpy.random import randn
from random import choice
# centroids will be normally-distributed around zero:
truelumps = randn(20, 5) * 10
# data randomly sampled from the centroids:
data = array([choice(truelumps) + randn(5) for _ in xrange(1000)])
for scaler in [0.01, 1, 100]:
    scdata = data * scaler
    thegmm = GMM(n_components=10)
    thegmm.fit(scdata, n_iter=1000)
    ll = thegmm.score(scdata)
    print sum(ll)

这是我得到的输出：

GMM(cvtype='diag', n_components=10)
7094.87886779
GMM(cvtype='diag', n_components=10)
-14681.566456
GMM(cvtype='diag', n_components=10)
-37576.4496656

原则上，我认为总体数据扩展不应该重要，并且每次都应该相似的总体日志样式。但是也许我正在忽略一个实施问题？

我通过scikit-learn邮件列表有一个答案：在我的代码示例中，log-likelihienhionhohone 应该确实会随着刻度而变化（因为我们'通过与log(scale)相关的因素来评估点的可能性，而不是积分）。因此，我认为我的代码示例实际上显示了GMM给出正确的结果。

我认为gmm是比例依赖的（例如k均值），因此建议标准化文档预处理章节中所述的输入。

相关内容

最新更新

热门标签：