如何从数组的字符串表示重建数组



我有一个由Python脚本生成的巨型CSV。某些单元格包含数据数组,而其他单元格包含单项数组。一些例子:

cell01 == ['"July, 2002"', 'CUREE Publication No. CEA-01.', 'Project No. 3126', 'Prepared for Consortium of Universities for Research in Earthquake Engineering.']
cell02 == ['[Memorandum from Ralph J. Johnson on Andy Place].']
cell03 == ["Financial statements for the years ended March 31, 1991 and 1990 and independent auditors' report"]

理想情况下,我想将所有这些数据解析为如下所示的结构:

cell01_parsed[0] == '"July, 2002"'
cell01_parsed[1] == 'CUREE Publication No. CEA-01.'
cell01_parsed[2] == 'Project No. 3126'
cell01_parsed[3] == 'Prepared for Consortium of Universities for Research in Earthquake Engineering.'
cell02_parsed == '[Memorandum from Ralph J. Johnson on Andy Place].'
cell03_parsed == 'Financial statements for the years ended March 31, 1991 and 1990 and independent auditors' report'

但是,当我使用 csv.reader()csv.DictReader() 时,这些行被解析为字符串,而不是数组。有什么简单的方法可以做到这一点?我不能使用split(',')因为某些字符串在项目中间有逗号。

您可以尝试通过正则表达式拆分字符串(找出适合您的数据的字符串(,如下所示:

import re
test_str = '"July, 2002", CUREE Publication No. CEA-01.' 
re.compile(',(?!.+")').split(test_str)

最新更新