我正在编写一个代码,用以下路径替换字符:[^w |]与"。关键是,当使用DataFrame 'sentenceDF'在我的函数'removePunctuation'我得到以下错误'列'对象是不可调用的'。
from pyspark.sql.functions import regexp_replace, trim, col, lower
def removePunctuation(column):
cleanString = column
cleanString = cleanString.select(regexp_replace(sentenceDF['sentence'],'[^w | ]','').alias('sentence'))
cleanString = cleanString.select(regexp_replace(cleanString['sentence'],'_','').alias('sentence'))
cleanString = cleanString.select(lower(cleanString['sentence']))
return cleanString
sentenceDF = sqlContext.createDataFrame([('Hi, you!',),
(' No under_score!',),
(' * Remove punctuation then spaces * ',)], ['sentence'])
result = sentenceDF.select(removePunctuation(col('sentence')))
result.show()
回溯:
TypeError: 'Column' object is not callable
--------------------------------------------------------------------------- TypeError Traceback (most recent call last)
<ipython-input-50-aa978fac8bae> in <module>()
15 (' * Remove punctuation then spaces * ',)], ['sentence'])
16
---> 17 result = sentenceDF.select(removePunctuation(col('sentence')))
18 result.show()
<ipython-input-50-aa978fac8bae> in removePunctuation(column)
4 def removePunctuation(column):
5 cleanString = column
----> 6 cleanString = cleanString.select(regexp_replace(sentenceDF['sentence'],'[^w | ]','').alias('sentence'))
7 cleanString = cleanString.select(regexp_replace(cleanString['sentence'],'_','').alias('sentence'))
8 cleanString = cleanString.select(lower(cleanString['sentence'])) TypeError: 'Column' object is not callable
Command took 0.09 seconds -- by andres.velez.e@gmail.com at 10/30/2016, 2:48:17 PM on My Cluster (6 GB)
只要这样做-你会得到相同的错误。
col('sentence').select()
建议:在重构成函数之前,一定要把代码写出来。
无论如何,我认为这是你想要的。def removePunctuation(df, column):
cleanString = df.select(trim(lower(col('sentence'))).alias('sentence'))
cleanString = cleanString.select(regexp_replace('sentence','[^w]|s+|_','').alias('sentence'))
return cleanString
result = removePunctuation(sentenceDF, 'sentence')
result.show()
+--------------------+
| sentence|
+--------------------+
| hiyou|
| nounderscore|
|removepunctuation...|
+--------------------+