我有一个Dataframe,需要删除前导零仅为负值类型的值,其余值是相同的。
例如
+-----------+-----------------+
| Input |output |
+-----------+-----------------+
| 0000-12.45| -12.45 |
| 000012.45 | 000012.45 |
| 000$.00| 000$.00|
| 0$ | 0$ |
| 0. | 0. |
| 51.46 | 51.46 |
| -123.67 | -123.67 |
| 00012.45 | 00012.45|
| 012.45 | 012.45 |
我试过下面的方法
spark.sql("""select regexp_replace("0000-12.45","^0+-(?!$)",'') as d,regexp_replace("000012.45","^0+-(?!$)",'') as d1,regexp_replace("0000.45","^0+-(?!$)",'') as d2,regexp_replace("0000$.00","^0+-(?!$)",'') as d3,regexp_replace("0.","^0+-(?!$)",'') as d4,regexp_replace("0$","^0+-(?!$)",'') as d5,regexp_replace("00","^0+-(?!$)",'') as d6,regexp_replace("51.46","^0+-(?!$)",'') as d7,regexp_replace("-12234.45","^0+-(?!$)",'') as d8, regexp_replace("0000-12234.45","^0+-(?!$)",'') as d9""").show()
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
| d| d1| d2| d3| d4| d5| d6| d7| d8| d9|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
|12.45|000012.45|0000.45|0000$.00| 0.| 0$| 00|51.46|-12234.45|12234.45|
+-----+---------+-------+--------+---+---+---+-----+---------+--------+
仅当包含负号
时,可以添加删除零的条件(df
.withColumn('ouput', F
.when(F.col('input').contains('-'), F.regexp_replace('input', '^0+', ''))
.otherwise(F.col('input'))
)
.show()
)
# +----------+---------+
# | input| ouput|
# +----------+---------+
# |0000-12.45| -12.45|
# | 000012.45|000012.45|
# | 000$.00| 000$.00|
# | 0$| 0$|
# | 0.| 0.|
# | 51.46| 51.46|
# | -123.67| -123.67|
# | 00012.45| 00012.45|
# | 012.45| 012.45|
# +----------+---------+