a = ['{"type": "book",',
'"title": "sometitle",',
'"author": [{"name": "somename"}],',
'"year": "2000",',
'"identifier": [{"type": "ISBN", "id": "1234567890"}],',
'"publisher": "somepublisher"}', '',
'{"type": "book",', '
'"title": "sometitle2",',
'"author": [{"name": "somename2"}],',
'"year": "2001",',
'"identifier": [{"type": "ISBN", "id": "1234567890"}],',
'"publisher": "somepublisher"}', '']
我有一个复杂的SList,我想最终把它变成一个整洁的熊猫数据帧。
我尝试了很多东西,例如:
i = iter(a)
b = dict(zip(i, i))
不幸的是,这创建了一个看起来更糟糕的字典:
{'{"type": "book",':
...
我以前有一本单行本词典,现在我有了一本词典。
我也试过
pd.json_normalize(a)
但这会抛出错误消息AttributeError: 'str' object has no attribute 'values'
我也试过
r = json.dumps(a.l)
loaded_r = json.loads(r)
print(loaded_r)
但这会产生一个列表
['{"type": "book",',
...
同样,最后我想要一个像这样的熊猫数据帧
type title author year ...
book sometitle somename 2000 ...
book sometitle2 somename2 2001
显然,我还没有真正达到可以将数据提供给pandas函数的地步。每次我这样做的时候,功能都会向我尖叫…
a = ['{"type": "book",',
'"title": "sometitle",',
'"author": [{"name": "somename"}],',
'"year": "2000",',
'"identifier": [{"type": "ISBN", "id": "1234567890"}],',
'"publisher": "somepublisher"}', '',
'{"type": "book",',
'"title": "sometitle2",',
'"author": [{"name": "somename2"}],',
'"year": "2001",',
'"identifier": [{"type": "ISBN", "id": "1234567890"}],',
'"publisher": "somepublisher"}', '']
b = "[%s]" % ''.join([',' if i == '' else i for i in a ]).strip(',')
data = json.loads(b)
df = pd.DataFrame(data)
print(df)
type title author year
0 book sometitle [{'name': 'somename'}] 2000
1 book sometitle2 [{'name': 'somename2'}] 2001
identifier publisher
0 [{'type': 'ISBN', 'id': '1234567890'}] somepublisher
1 [{'type': 'ISBN', 'id': '1234567890'}] somepublisher