pyspark中的Regexp_Replace工作不正常

我正在读取一个csv文件，它类似于：

"ZEN","123"
"TEN","567"

现在，如果我用regexp_replace替换字符E，它不会给出正确的结果：

from pyspark.sql.functions import 
row_number,col,desc,date_format,to_date,to_timestamp,regexp_replace
inputDirPath="/FileStore/tables/test.csv"
schema = StructType()
for field in fields:
colType = StringType()
schema.add(field.strip(),colType,True)
incr_df = spark.read.format("csv").option("header", 
"false").schema(schema).option("delimiter", "u002c").option("nullValue", 
"").option("emptyValue","").option("multiline",True).csv(inputDirPath)
for column in incr_df.columns:
inc_new=incr_df.withColumn(column, regexp_replace(column,"E","") )
inc_new.show()

没有给出正确的结果，它没有做任何

注意：我有100+列，所以需要用于循环

有人能帮我找出错误吗？

列表理解将更简洁、更容易。让我们试试

inc_new =inc_new.select(*[regexp_replace(x,'E','').alias(x) for x in  inc_new.columns])
inc_new.show()

相关内容

最新更新

热门标签：