如何在Apache Spark的Databricks上从str输出创建Spark或Pandas数据框架



我已经将变量"myoutput"到如下字符串

myoutput = result.content

我的输出如下:

Out[10]: 'Company A InvoicenInvoice For:nAddress:n567 Main St.nRedmond, WAn555-555-5555nBilbo Bagginsn123 Hobbit LanenRedmond, WAn555-555-5555nSubtotal: 300.00nTax: 30.00nTip: 100.00nTotal: 430.00nSignature: ____Bilbo Baggins__________nItemnQuantitynPricenAn1n10.99nBn2n14.67nCn4n15.66nDn1n12.00nEn4n10.00nFn6n12.00nGn8n22.00'

我想从"myoutput"创建一个spark数据框或pandas数据框。

任何想法?

import pandas as pd
str_output = 'Company A InvoicenInvoice For:nAddress:n567 Main St.nRedmond, WAn555-555-5555nBilbo Bagginsn123 Hobbit LanenRedmond, WAn555-555-5555nSubtotal: 300.00nTax: 30.00nTip: 100.00nTotal: 430.00nSignature: ____Bilbo Baggins__________nItemnQuantitynPricenAn1n10.99nBn2n14.67nCn4n15.66nDn1n12.00nEn4n10.00nFn6n12.00nGn8n22.00'
df_data = pd.DataFrame({'ColumnA':str_output.splitlines()})
df_data

参考:如何用新行字符分割Python字符串

最新更新