spark java中的字母数字排序

我的输入将是

Id txtyp
1  1
2.  A1
3. A2
4. A3

输出应为

Id tx_typ
2.  A1
3. A2
4. A3
1. 1

我尝试过在spark java中使用order by()和sort，但它不起作用。。sparkjava中还有其他方式或自定义排序吗？

您可以创建一个新的布尔列is_numeric，当tx_typ是数值false时，该列的值为true。然后，按两列is_numeric和tx_typ进行排序。最后，删除is_numeric列。

翻译成代码，其中input是你的输入Dataset<Row>，它给你：

import static org.apache.spark.sql.functions.col;
...
input.withColumn("is_numeric", col("tx_typ").cast("int").isNotNull())
.orderBy("is_numeric", "tx_typ")
.drop("is_numeric");

具有以下行的input数据集：

+---+------+
|Id |tx_typ|
+---+------+
|1  |1     |
|2  |A1    |
|3  |A2    |
|4  |A3    |
+---+------+

您得到以下行的输出数据集：

+---+------+
|Id |tx_typ|
+---+------+
|2  |A1    |
|3  |A2    |
|4  |A3    |
|1  |1     |
+---+------+

相关内容

最新更新

热门标签：