如何计算与Spark的Spearman相关系数?我无法复制统计书中的样本



为了训练自己使用Spark和经典统计分析,我试图执行一些书籍中的样本(中性统计书籍:不专门用于计算或Spark(。

书中的样本提供了计算两位评委对十名运动员的斯皮尔曼相关系数的方法:

|Judge 1|8.3|7.6|9.1|9.5|8.4|6.9|9.2|7.8|8.6|8.2
|Judge 2|7.9|7.4|9.1|9.3|8.4|7.5|9.0|7.2|8.2|8.1

创建列的中间矩阵,
|Judge 1|5|2|8|10|6|1|9|3|7|4
| Judge 2|4|2|9|10|7|3|8|1|6|5

书中的样本最终以的结果结束

r=0.915

根据Correlation:的API文档,我尝试用Spark以这种方式实现它

List<Row> data = Arrays.asList(
RowFactory.create(Vectors.dense(8.3, 7.6, 9.1, 9.5, 8.4, 6.9, 9.2, 7.8, 8.6, 8.2)),
RowFactory.create(Vectors.dense(7.9, 7.4, 9.1, 9.3, 8.4, 7.5, 9.0, 7.2, 8.2, 8.1))
);
StructType schema = new StructType(new StructField[]{
new StructField("features", new VectorUDT(), false, Metadata.empty()),
});
Dataset<Row> df = this.session.createDataFrame(data, schema);
Row r2 = Correlation.corr(df, "features", "spearman").head();
System.out.println("Spearman correlation matrix:n" + r2.get(0).toString());

但它不会给我一个系数。相反,另一个在我看来很奇怪的矩阵:

Spearman correlation matrix:
1.0                  0.9999999999999998   NaN  ... (10 total)
0.9999999999999998   1.0                  NaN  ...
NaN                  NaN                  1.0  ...
0.9999999999999998   0.9999999999999998   NaN  ...
NaN                  NaN                  NaN  ...
-0.9999999999999998  -0.9999999999999998  NaN  ...
0.9999999999999998   0.9999999999999998   NaN  ...
0.9999999999999998   0.9999999999999998   NaN  ...
0.9999999999999998   0.9999999999999998   NaN  ...
0.9999999999999998   0.9999999999999998   NaN  ...

我是MLib的新手,统计学不太强。很明显,我做错了。

我在这里看到了什么,而不是我所期望的,
以及我该如何实现我所希望的结果?

问题解决方案的一部分是
我只是把矢量放错了一边。这个,纠正那个:

List<Row> data = Arrays.asList(
RowFactory.create(Vectors.dense(8.3, 7.9)),
RowFactory.create(Vectors.dense(7.6, 7.4)),
RowFactory.create(Vectors.dense(9.1, 9.1)),
RowFactory.create(Vectors.dense(9.5, 9.3)),
RowFactory.create(Vectors.dense(8.4, 8.4)),
RowFactory.create(Vectors.dense(6.9, 7.5)),
RowFactory.create(Vectors.dense(9.2, 9.0)),
RowFactory.create(Vectors.dense(7.8, 7.2)),
RowFactory.create(Vectors.dense(8.6, 8.2)),
RowFactory.create(Vectors.dense(8.2, 8.1))
);

体育双壶音符之间的相关性:
1.0 0.91515151515151515153
0.9151515151515151153 1.0

最新更新