我是Spark的新手,我正在尝试使用Spark DataFrame中的MAP类型列对UDF进行排序,此后我尝试将数据保存到Hive,如下:
val vectorHead = udf { (z: SparseVector, x: SparseVector, y: mutable.WrappedArray[String]) =>
var map2 = Map.empty[String, Double]
for (i <- x.values.indices) {
if (x.values(i) * z.values(i) >= threshold && y(i)!="") {
map2+=(y(i)->x.values(i)* z.values(i))
}
}
ListMap(map2.toSeq.sortBy(-_._2):_*)
}
val rescaledDataNew = dataFrame.withColumn("words_with_tf*idf", vectorHead(dataFrame("TFFeatures"), dataFrame("IDFFeatures"), dataFrame("new_words"))).drop("words","TFFeatures","IDFFeatures")
println("This is the new data after drop low TF*IDF")
rescaledDataNew.show()
rescaledDataNew.createTempView("TEST")
rescaledDataNew.sqlContext.sql("DROP TABLE IF EXISTS " + dataSavePath)
rescaledDataNew.sqlContext.sql("CREATE TABLE " + dataSavePath + " AS SELECT * FROM TEST")
运行后,我没有任何错误没有任何警告,结果是:
{"美食":6.978342,"游艇":8.91278,"翠园":6.1228666,"花桥镇":10.032949,"青咖喱鸡":6.914152}
我想要的是:
{"花桥镇":10.032949,"游艇":8.91278,"美食":6.978342,"青咖喱鸡":6.914152,"翠园":6.1228666}
将代码更改为
时ListMap(map2.toSeq.sortBy(-_._2):_*).toString
然后结果是:
Map{"花桥镇"->10.032949,"游艇"->8.91278,"美食"->6.978342,"青咖喱鸡"->6.914152,"翠园"->6.1228666}
那么,有人可以告诉我我该怎么做才能得到我想要的东西?
这似乎是show()方法的问题。尝试将DF写入文件,应该按照您想要对其进行排序。