如何提供值从同一行到scala spark子字符串函数?

我有以下数据框与fnamelname列，我想转换:

+---+---------------+---+----------+--------+
| id|     fnamelname|age|       job|ageLimit|
+---+---------------+---+----------+--------+
|  1|    xxxxx xxxxx| 28|   teacher|      18|
|  2|  xxxx xxxxxxxx| 30|programmer|       0|
|  3|    xxxxx xxxxx| 28|   teacher|      18|
|  8|xxxxxxx xxxxxxx| 12|programmer|       0|
|  9| xxxxx xxxxxxxx| 45|programmer|       0|
+---+---------------+---+----------+--------+
only showing top 5 rows
root
|-- id: string (nullable = true)
|-- fnamelname: string (nullable = true)
|-- age: integer (nullable = false)
|-- job: string (nullable = true)
|-- ageLimit: integer (nullable = false)

我想使用ageLimit作为substring函数中的len值，但不知何故，.cast("Int")函数不适用于该行的值。

val readyDF: Dataset[Row] = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
substring(col("fnamelname"), 0, col("ageLimit").cast("Int")))

我得到的是:

found   : org.apache.spark.sql.Column
required: Int
col("fnamelname"),0, col("ageLimit").cast("Int")))

如何在.withColumn()内提供另一列的值作为变量?

substring函数接受Int参数作为子字符串长度。col("ageLimit").cast("Int")不是Int，而是另一个Column对象，它保存着ageLimit列中的整数值。

使用Column的substr方法。它有一个重载，为位置和子串长度占用两个Columns。要传递位置列的文字0，请使用lit(0):

val readyDF = peopleWithJobsAndAgeLimitsDF.withColumn("fnamelname",
col("fnamelname").substr(lit(0), col("ageLimit")))

您不能直接使用substring(或具有类似签名的任何其他函数)做到这一点。你必须使用expr，所以解决方案是这样的:

peopleWithJobsAndAgeLimitsDF
.withColumn(
"fnamelname",
expr("substring(fnamelname, 0, ageLimit)")
)

相关内容

最新更新

热门标签：