Spark Sql 数据集获取索引号



如果我有这样的案例类:

Person(name:String = null, rank:Integer = null)

我有一个dataset: Dataset[Person]

假设数据集有 5 个人对象:

Dataset[  Person(name = "Jack",id = 100, rank = null), 
Person(name = "Mary",id = 400, rank = null),
Person(name = "Tom",id = 199, rank = null), 
Person(name = "Linda", id = 55, rank = null),
Person(name = "Wendy", id = 30, rank = null)]

我想在按 id 对数据集进行排序后填充 Scala 中的排名字段。使数据集变为:

Dataset[  Person(name = "Wendy", id = 30, rank = 1), 
Person(name = "Linda", id = 55, rank = 2),
Person(name = "Jack", id = 100, rank = 3), 
Person(name = "Tom", id = 199, rank = 4),
Person(name = "Mary", id = 400, rank = 5)]

提前感谢!

如果你有一个数据集,那么你可以使用row_number函数添加排名列

ds.withColumn("rank", row_number().over(Window.orderBy($"id")))

或者也带有排名功能

ds.withColumn("rank", rank().over(Window.orderBy("id")))

def row_number((: 列

窗口函数:返回一个从 1 开始的序列号,其中 窗口分区。

希望这有帮助!

最新更新