使用 pySpark 将值打印为德国千位分隔符以及十进制值

我需要将字符串类型的数据帧列转换为双精度并添加格式掩码，如千位分隔符和小数位。

输入数据框：

column(StringType)
2655.00
15722.50
235354.66

所需格式：

(-1) * to_number(df.column, format mask)

数据以千位分隔符.提供，,以小数分隔符和 2 个十进制数字提供

输出列：

2.655,00
15.722,50
235.354,66

Sparkdate_format返回格式类似于#,###,###.##的字符串编号，因此您需要将.替换为,，.替换为,以获得所需的欧洲格式。

首先，用点替换点#然后用点替换逗号，最后用点替换#。

df.withColumn("european_format", regexp_replace(regexp_replace(regexp_replace(
format_number(col("column").cast("double"), 2), '\.', '#'), ',', '\.'), '#', ',')
).show()

给：

+---------+---------------+
|   column|european_format|
+---------+---------------+
|  2655.00|       2.655,00|
| 15722.50|      15.722,50|
|235354.66|     235.354,66|
+---------+---------------+

您可以简单地执行以下操作：

import pyspark.sql.functions as F
# create a new colum with formatted date
df = df.withColumn('num_format', F.format_number('col', 2))
# switch the dot and comma
df = df.withColumn('num_format', F.regexp_replace(F.regexp_replace(F.regexp_replace('num_format', '\.', '@'), ',', '\.'), '@', ','))
df.show()
+---------+----------+
|      col|num_format|
+---------+----------+
|   2655.0|  2.655,00|
|  15722.5| 15.722,50|
|235354.66|235.354,66|
+---------+----------+

相关内容

最新更新

热门标签：