态
我有一个带有Icao
列的数据框,其中我想将其转换为Long
DataType。我该如何在Spark SQL中执行此操作?
| Icao|count|
+------+-----+
|471F8D|81350|
|471F58|79634|
|471F56|79112|
|471F86|78177|
|471F8B|75300|
|47340D|75293|
|471F83|74864|
|471F57|73815|
|471F4A|72290|
|471F5F|72133|
|40612C|69676|
conv(num:column,frombase:int,tobase:int):列将字符串列中的一个数字从一个底座转换为另一个基础。
使用conv
解决方案的解决方案可能如下:
scala> icao.show
+------+-----+
| Icao|count|
+------+-----+
|471F8D|81350|
|471F58|79634|
|471F56|79112|
|471F86|78177|
|471F8B|75300|
|47340D|75293|
|471F83|74864|
|471F57|73815|
|471F4A|72290|
|471F5F|72133|
|40612C|69676|
+------+-----+
// conv is not available by default unless you're in spark-shell
import org.apache.spark.sql.functions.conv
val s1 = icao.withColumn("conv", conv($"Icao", 16, 10))
scala> s1.show
+------+-----+-------+
| Icao|count| conv|
+------+-----+-------+
|471F8D|81350|4661133|
|471F58|79634|4661080|
|471F56|79112|4661078|
|471F86|78177|4661126|
|471F8B|75300|4661131|
|47340D|75293|4666381|
|471F83|74864|4661123|
|471F57|73815|4661079|
|471F4A|72290|4661066|
|471F5F|72133|4661087|
|40612C|69676|4219180|
+------+-----+-------+
conv
具有为您提供输入列类型的结果,所以我从字符串开始并获得字符串。
scala> s1.printSchema
root
|-- Icao: string (nullable = true)
|-- count: string (nullable = true)
|-- conv: string (nullable = true)
如果我使用了ints,我会得到ints。
您可以使用另一种内置方法cast
施放conv
的结果(或以适当类型的输入列开始)。
val s2 = icao.withColumn("conv", conv($"Icao", 16, 10) cast "long")
scala> s2.printSchema
root
|-- Icao: string (nullable = true)
|-- count: string (nullable = true)
|-- conv: long (nullable = true)
scala> s2.show
+------+-----+-------+
| Icao|count| conv|
+------+-----+-------+
|471F8D|81350|4661133|
|471F58|79634|4661080|
|471F56|79112|4661078|
|471F86|78177|4661126|
|471F8B|75300|4661131|
|47340D|75293|4666381|
|471F83|74864|4661123|
|471F57|73815|4661079|
|471F4A|72290|4661066|
|471F5F|72133|4661087|
|40612C|69676|4219180|
+------+-----+-------+
您可以将 java hex用于长转换器
java.lang.Long.parseLong(hex.trim(), 16)
您所需要的只是将udf
函数定义为
import org.apache.spark.sql.functions.udf
def hexToLong = udf((hex: String) => java.lang.Long.parseLong(hex.trim(), 16))
并使用.withColumn
API
udf
函数 df.withColumn("Icao", hexToLong($"Icao")).show(false)