如何将JSON SList转换为Panda数据帧


a = ['{"type": "book",', 
'"title": "sometitle",', 
'"author": [{"name": "somename"}],', 
'"year": "2000",', 
'"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
'"publisher": "somepublisher"}', '',
'{"type": "book",', '
'"title": "sometitle2",', 
'"author": [{"name": "somename2"}],', 
'"year": "2001",', 
'"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
'"publisher": "somepublisher"}', '']

我有一个复杂的SList,我想最终把它变成一个整洁的熊猫数据帧。

我尝试了很多东西,例如:

i = iter(a)
b = dict(zip(i, i))

不幸的是,这创建了一个看起来更糟糕的字典:

{'{"type": "book",':
...

我以前有一本单行本词典,现在我有了一本词典。

我也试过

pd.json_normalize(a)

但这会抛出错误消息AttributeError: 'str' object has no attribute 'values'

我也试过

r = json.dumps(a.l)
loaded_r = json.loads(r)
print(loaded_r)

但这会产生一个列表

['{"type": "book",',
...

同样,最后我想要一个像这样的熊猫数据帧

type   title       author     year ...
book   sometitle   somename   2000 ...
book   sometitle2 somename2   2001

显然,我还没有真正达到可以将数据提供给pandas函数的地步。每次我这样做的时候,功能都会向我尖叫…

a = ['{"type": "book",', 
'"title": "sometitle",', 
'"author": [{"name": "somename"}],', 
'"year": "2000",', 
'"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
'"publisher": "somepublisher"}', '',
'{"type": "book",', 
'"title": "sometitle2",', 
'"author": [{"name": "somename2"}],', 
'"year": "2001",', 
'"identifier": [{"type": "ISBN", "id": "1234567890"}],', 
'"publisher": "somepublisher"}', '']
b = "[%s]" % ''.join([',' if i == '' else i for i in a ]).strip(',')
data = json.loads(b)
df = pd.DataFrame(data)
print(df)
type       title                   author  year  
0  book   sometitle   [{'name': 'somename'}]  2000   
1  book  sometitle2  [{'name': 'somename2'}]  2001   
identifier      publisher  
0  [{'type': 'ISBN', 'id': '1234567890'}]  somepublisher  
1  [{'type': 'ISBN', 'id': '1234567890'}]  somepublisher

最新更新