用pandas读取json,列有问题



我正在尝试使用panda将json文件转换为csv表。

我的json文件来自《权力的游戏》,包含角色的信息。问题是每一行都可以包含字典的不同键。

我想构建一个csv文件,将json文件中每一行的所有唯一键作为列。

这是json文件的链接

这就是我目前所做的:

import pandas as pd
file = link
df = pd.read_json(file)
df2 = df['characters']
#from here on I'm stuck

请参阅pandas.json_normalize.

>>> import pandas as pd
>>> path = 'characters.json'
>>> df = pd.read_json(path)
>>> df2 = pd.json_normalize(df['characters'])
>>> df2.columns
Index(['characterName', 'characterLink', 'actorName', 'actorLink', 'houseName',
'royal', 'parents', 'siblings', 'killedBy', 'characterImageThumb',
'characterImageFull', 'nickname', 'killed', 'servedBy', 'parentOf',
'marriedEngaged', 'serves', 'kingsguard', 'guardedBy', 'actors',
'guardianOf', 'allies', 'abductedBy', 'abducted', 'sibling'],
dtype='object')
>>> df2.loc[df2['characterName']=='Sansa Stark']
characterName          characterLink      actorName         actorLink  ... allies abductedBy abducted sibling
300   Sansa Stark  /character/ch0158137/  Sophie Turner  /name/nm3849842/  ...    NaN        NaN      NaN     NaN
[1 rows x 25 columns]
>>>
>>> df2.loc[df2['characterName']=='Sansa Stark']['siblings']
300    [Robb Stark, Arya Stark, Bran Stark, Rickon St...
Name: siblings, dtype: object
>>> df2.to_csv('got.csv', index=False)

如果您已经下载了json文件并以file.json的名称保存,则可以使用以下代码:

import pandas as pd
import json

with open('file.json') as f:
d = json.load(f)['characters']
df = pd.DataFrame(d)
df.to_csv("characters.csv")

或者,如果你想从URL下载json文件,你可以使用这个:

import pandas as pd
import json
from urllib.request import urlopen
url = "https://raw.githubusercontent.com/jeffreylancaster/game-of- 
thrones/master/data/characters.json"
response = urlopen(URL)
data_json = json.loads(response.read())
characters = data_json['characters']
df = pd.DataFrame(characters)
df.to_csv("characters.csv")

最新更新