>我有这个数据帧,它确实包含值
val cabArticleLocal = spark.load("jdbc", Map("url" -> url, "dbtable" -> "cabarticle"))
cabArticleLocal.show
root
|-- is_enabled: boolean (nullable = true)
|-- cab_article: long (nullable = true)
|-- article_id: long (nullable = true)
+----------+-----------+----------+
|is_enabled|cab_article|article_id|
+----------+-----------+----------+
+----------+-----------+----------+
将插入到具有此结构的PostgreSQL数据库中
id
is_enabled
cab_article
article_id
如何在数据帧中生成字段 id,以便将自动生成的 id 插入到现有数据帧中。谢谢
+----------+-----------+----------+---+
|is_enabled|cab_article|article_id| id|
+----------+-----------+----------+---+
+----------+-----------+----------+---+
您可以使用monotonically_increasing_id
函数
import org.apache.spark.sql.functions._
cabArticleLocal.withColumn("id", monotonically_increasing_id())
或者您可以使用row_number
函数而不是Window
函数作为
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
cabArticleLocal.withColumn("id", row_number().over(Window.orderBy("article_id")))
或者,您可以将查询语言用作
cabArticleLocal.createOrReplaceTempView("tempTable")
sqlContext.sql("select row_number() over (order by article_id) as id,is_enabled,cab_article,article_id from tempTable")