仅删除负值的前导零



我有一个Dataframe,需要删除前导零仅为负值类型的值,其余值是相同的。

例如

+-----------+-----------------+
| Input     |output           |
+-----------+-----------------+
| 0000-12.45|          -12.45 |
| 000012.45 |       000012.45 |
|    000$.00|          000$.00| 
|      0$   |            0$   |
|    0.     |            0.   |
|   51.46   |          51.46  | 
|   -123.67 |         -123.67 |
|  00012.45 |         00012.45| 
|  012.45   |         012.45  | 

我试过下面的方法

spark.sql("""select regexp_replace("0000-12.45","^0+-(?!$)",'') as d,regexp_replace("000012.45","^0+-(?!$)",'') as d1,regexp_replace("0000.45","^0+-(?!$)",'') as d2,regexp_replace("0000$.00","^0+-(?!$)",'') as d3,regexp_replace("0.","^0+-(?!$)",'') as d4,regexp_replace("0$","^0+-(?!$)",'') as d5,regexp_replace("00","^0+-(?!$)",'') as d6,regexp_replace("51.46","^0+-(?!$)",'') as d7,regexp_replace("-12234.45","^0+-(?!$)",'') as d8, regexp_replace("0000-12234.45","^0+-(?!$)",'') as d9""").show()
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
| d| d1| d2| d3| d4| d5| d6| d7| d8| d9|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
|12.45|000012.45|0000.45|0000$.00| 0.| 0$| 00|51.46|-12234.45|12234.45|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+

仅当包含负号

时,可以添加删除零的条件
(df
.withColumn('ouput', F
.when(F.col('input').contains('-'), F.regexp_replace('input', '^0+', ''))
.otherwise(F.col('input'))
)
.show()
)
# +----------+---------+
# |     input|    ouput|
# +----------+---------+
# |0000-12.45|   -12.45|
# | 000012.45|000012.45|
# |   000$.00|  000$.00|
# |        0$|       0$|
# |        0.|       0.|
# |     51.46|    51.46|
# |   -123.67|  -123.67|
# |  00012.45| 00012.45|
# |    012.45|   012.45|
# +----------+---------+

相关内容

  • 没有找到相关文章

最新更新