i使用org.apache.mahout.common.distance.mahalanobisdistanceMeasure,计算矩阵和平均值向量中的行之间的距离,但有时会返回NAN。我试图调试,看来nullpointerexception在对象类中抛出。但是对于其他行来说,一切都很好。如果有人能给我一些指导,我会很感激。
import com.opencsv.CSVReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import com.opencsv.CSVWriter;
import org.apache.commons.math.MathException;
import org.apache.mahout.common.distance.MahalanobisDistanceMeasure;
import org.apache.mahout.math.*;
import org.apache.mahout.math.Matrix;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.stat.correlation.Covariance;
import org.apache.commons.math.distribution.ChiSquaredDistributionImpl;
public class FindMultivariateOutliers {
public static void main(String[] args) {
String url = "VIC_20160401_201606301.csv";
double[][] data = extractRealData(readCSV(url), 3);
double[][] dataSet = new double[30][49];
for (int i = 30; i < 60; i++) {
dataSet[i-30] = data[i];
}
double[] mean = calculateMeanVector(dataSet);
Vector meanVector = new DenseVector(mean);
Matrix covarianceMatrix = covarianceMatrix(dataSet);
MahalanobisDistanceMeasure measure = new MahalanobisDistanceMeasure();
measure.setMeanVector(meanVector);
measure.setCovarianceMatrix(covarianceMatrix);
for (int i = 0; i < dataSet.length; i++) {
DenseVector ve = new DenseVector(dataSet[i]);
double x = measure.distance(dataSet[centroid(dataSet)[0]][centroid(dataSet)[1]],meanVector,ve);
System.out.println(i+" "+x);
}
}
输出:
0 NaN
1 NaN
2 1.3382137932701006
3 5.140281428741069
4 5.448118335171329
5 4.658774790167001
6 3.055235041048766
7 5.577659807980593
8 2.9899726295069784
9 6.095988936666251
10 5.188517209151716
11 3.2929774499538014
12 5.090550175124932
13 5.801822265633947
14 4.714239296215186
15 5.02905587450129
16 4.981122780626051
17 5.195044166268684
18 5.325097238194922
19 4.7899888250142375
20 5.506442897174045
21 5.266585564849615
22 5.403384368592266
23 4.110229775894713
24 5.960687924915147
25 4.5745629099807745
26 5.0580441561885205
27 5.146058878694013
28 5.1375323540721425
29 3.7919178679466015
centroid()是一种计算矩阵的质心,返回int [2]的方法(第一个元素是x坐标,第二个为y)。数据集是我关注的矩阵。
" nan"代表"不是数字"。如果浮点操作(Double/Float)具有一些输入参数,则会产生" NAN",从而导致操作产生一些不确定的结果。例如,0.0除以0.0是算术不确定的。占负数的平方根也是不确定的。
nan是double和float包装器类中的静态变量。它不是数字值,因此在您的情况下,当您尝试测量两个坐标之间的距离时,它返回double.nan,然后尝试将其转换为原始号码,因此它将为您提供NullPoInterException。