使用Python中的comma(,)符号删除替换管道(|)符号



myRDD数据2 rows

[u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']

如何用,替换|,以便我可以构建dataframe。有什么更好的方法可以使用此类数据构建Dataframe。?

>>> data = [u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']
>>> data = [item.replace("|", ",") for item in data]
>>> data
['#fields:excDate,schedDate,TZ,custID,muID,tvID,acdID,logonID,agentName,modify,exception,start,stop,LS Oracle Emp ID,Team Lead', '06152016,06152016,CET,3,3000,1688,87,,Ali, AbdElaziz,1465812004,Open,08:00,09:00,101021021,ElDeleify,Hisham']

根据 createDataFrame上的spark doc,创建帧的一种方法是将其数据作为列表列表和标题作为列表传递。

data = [u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']
data = [d.split("|") for d in data] #creating a list of list 
shema = data[0] # the first row of the data is the in reality the schema
data = data[1:] # remove the schema from the data
schema[0] =schema[0].split(":",1)[1] #to remove the #fields: of the first header
dataframe = sqlContext.createDataFrame(data,schema)

它甚至不需要循环,假设您的字符串称为'data':

data[0] = data[0].replace('|',',')

在一行中做得很好。

最新更新