小贝子编程

如何在 Spark SQL 中添加表的增量列 ID

本文关键字：ID 添加 Spark SQL apache-spark apache-spark-sql apache-spark-mllib
更新时间 : 2023-08-21
英文 : how to add a Incremental column ID for a table in spark SQL

我正在研究一个火花mllib算法。我拥有的数据集是这种形式

公司"："

XXXX"，"当前标题"："XYZ"，"Edu_Title"："ABC"，"Exp_mnth"：.（有更多类似的值）

我试图将字符串值原始代码为数值。因此，我尝试使用zipwithuniqueID作为每个字符串值的唯一值。由于某种原因，我无法将修改后的数据集保存到磁盘。我可以使用 spark SQL 以任何方式做到这一点吗？或者有什么更好的方法呢？

Scala

import org.apache.spark.sql.functions.monotonically_increasing_id
val dataFrame1 = dataFrame0.withColumn("index",monotonically_increasing_id())

爪哇岛

 Import org.apache.spark.sql.functions;
Dataset<Row> dataFrame1 = dataFrame0.withColumn("index",functions.monotonically_increasing_id());

相关内容

没有找到相关文章

最新更新