我有一个有许多列的CSV文件,我试图删除我从CSV创建的数据框中的所有双引号(")。
目前我的代码如下
import pyspark.pandas as ps
def removeDoubleQuotes(x):
return x.replace('"', '')
newDf = df.apply(removeDoubleQuotes, axis=1)
但是当我运行这段代码时,输出保持不变(仍然有双引号)
只是为了测试apply函数,我在每个值的末尾附加了一个字符串,它工作了。所以,我不确定为什么replace不起作用。(检查每个元素的类型,它们都是字符串)。
我还在数据框架中的单个单元格上进行了测试,并且removeDoubleQuotes函数也可以工作。
也许我误用了apply?
谢谢你的帮助!
可以使用pyspark的regexp_replace函数。例子:-
>>> df=spark.read.format("csv").option('header','True').option('inferSchema', 'True').option("delimiter", '|').load("/Path to/sample1.csv")
>>> df.show()
+--------+--------------------+--------+------+-------------------+-----------------+---------------+-----+-----+----+
| OrderID| Product|Quantity| Price| OrderDate| StoreAddres| City|State|Month|Hour|
+--------+--------------------+--------+------+-------------------+-----------------+---------------+-----+-----+----+
|295665.0| Macbook Pro Laptop| 1.0|1700.0|2019-12-30 00:01:00|136 Church St, Ne|New " York City| 123| 12.0| 0.0|
|295666.0| LG Washing Machine| 1.0| 600.0|2019-12-29 07:03:00| 562 2nd St, Ne|New York " City| NY| 12.0| 7.0|
|295667.0|USB-C Charging Cable| 1.0| 11.95|2019-12-12 18:21:00| 277 Main St, New| New York City| NY| 12.0|18.0|
+--------+--------------------+--------+------+-------------------+-----------------+---------------+-----+-----+----+
>>> from pyspark.sql.functions import *
>>> df.withColumn('City', regexp_replace('City', '"', '')).show()
+--------+--------------------+--------+------+-------------------+-----------------+--------------+-----+-----+----+
| OrderID| Product|Quantity| Price| OrderDate| StoreAddres| City|State|Month|Hour|
+--------+--------------------+--------+------+-------------------+-----------------+--------------+-----+-----+----+
|295665.0| Macbook Pro Laptop| 1.0|1700.0|2019-12-30 00:01:00|136 Church St, Ne|New York City| 123| 12.0| 0.0|
|295666.0| LG Washing Machine| 1.0| 600.0|2019-12-29 07:03:00| 562 2nd St, Ne|New York City| NY| 12.0| 7.0|
|295667.0|USB-C Charging Cable| 1.0| 11.95|2019-12-12 18:21:00| 277 Main St, New| New York City| NY| 12.0|18.0|
+--------+--------------------+--------+------+-------------------+-----------------+--------------+-----+-----+----+