如果我有这样的案例类:
Person(name:String = null, rank:Integer = null)
我有一个dataset: Dataset[Person]
假设数据集有 5 个人对象:
Dataset[ Person(name = "Jack",id = 100, rank = null),
Person(name = "Mary",id = 400, rank = null),
Person(name = "Tom",id = 199, rank = null),
Person(name = "Linda", id = 55, rank = null),
Person(name = "Wendy", id = 30, rank = null)]
我想在按 id 对数据集进行排序后填充 Scala 中的排名字段。使数据集变为:
Dataset[ Person(name = "Wendy", id = 30, rank = 1),
Person(name = "Linda", id = 55, rank = 2),
Person(name = "Jack", id = 100, rank = 3),
Person(name = "Tom", id = 199, rank = 4),
Person(name = "Mary", id = 400, rank = 5)]
提前感谢!
如果你有一个数据集,那么你可以使用row_number函数添加排名列
ds.withColumn("rank", row_number().over(Window.orderBy($"id")))
或者也带有排名功能
ds.withColumn("rank", rank().over(Window.orderBy("id")))
def row_number((: 列
窗口函数:返回一个从 1 开始的序列号,其中 窗口分区。
希望这有帮助!