kmeans-pyspark org.apache.spark.SparkException:由于阶段失败,作业中止



我想在我的基础上使用k-均值(670万行和22个变量(,

base.dtypes

('anonimisation2', 'double'),
('anonimisation3', 'double'),
('anonimisation4', 'double'),
('anonimisation5', 'double'),
('anonimisation6', 'double'),
('anonimisation7', 'double'),
('anonimisation8', 'double'),
('anonimisation9', 'double'),
('anonimisation10', 'double'),
('anonimisation11', 'double'),
('anonimisation12', 'double'),
('anonimisation13', 'double'),
('anonimisation14', 'double'),
('anonimisation15', 'double'),
('anonimisation16', 'double'),
('anonimisation17', 'double'),
('anonimisation18', 'double'),
('anonimisation19', 'double'),
('anonimisation20', 'double'),
('anonimisation21', 'double'),
('anonimisation22', 'double')]

我读到我应该使用这个代码:

def transData(base):
return base.rdd.map(lambda r: [Vectors.dense(r[:-1])]).toDF(['features'])
transformed= transData(base)
transformed.show(5, False)

然后我写了这个:

kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(transformed)

我有一个错误:

IllegalArgumentException: 'requirement failed: Column features must be of type equal to one of the following types: [struct<type:tinyint,size:int,indices:array<int>,values:array<double>>, array<double>, array<float>] but was actually of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'

不知道该怎么办?如果你想了解更多信息,只需询问感谢

我试着在Pandas上使用python,但我在上也遇到了问题

使用from pyspark.ml.linalg import Vectors而不是from pyspark.mllib.linalg import Vectors

最新更新