我正在尝试使用panda将json文件转换为csv表。
我的json文件来自《权力的游戏》,包含角色的信息。问题是每一行都可以包含字典的不同键。
我想构建一个csv文件,将json文件中每一行的所有唯一键作为列。
这是json文件的链接
这就是我目前所做的:
import pandas as pd
file = link
df = pd.read_json(file)
df2 = df['characters']
#from here on I'm stuck
请参阅pandas.json_normalize.
>>> import pandas as pd
>>> path = 'characters.json'
>>> df = pd.read_json(path)
>>> df2 = pd.json_normalize(df['characters'])
>>> df2.columns
Index(['characterName', 'characterLink', 'actorName', 'actorLink', 'houseName',
'royal', 'parents', 'siblings', 'killedBy', 'characterImageThumb',
'characterImageFull', 'nickname', 'killed', 'servedBy', 'parentOf',
'marriedEngaged', 'serves', 'kingsguard', 'guardedBy', 'actors',
'guardianOf', 'allies', 'abductedBy', 'abducted', 'sibling'],
dtype='object')
>>> df2.loc[df2['characterName']=='Sansa Stark']
characterName characterLink actorName actorLink ... allies abductedBy abducted sibling
300 Sansa Stark /character/ch0158137/ Sophie Turner /name/nm3849842/ ... NaN NaN NaN NaN
[1 rows x 25 columns]
>>>
>>> df2.loc[df2['characterName']=='Sansa Stark']['siblings']
300 [Robb Stark, Arya Stark, Bran Stark, Rickon St...
Name: siblings, dtype: object
>>> df2.to_csv('got.csv', index=False)
如果您已经下载了json文件并以file.json的名称保存,则可以使用以下代码:
import pandas as pd
import json
with open('file.json') as f:
d = json.load(f)['characters']
df = pd.DataFrame(d)
df.to_csv("characters.csv")
或者,如果你想从URL下载json文件,你可以使用这个:
import pandas as pd
import json
from urllib.request import urlopen
url = "https://raw.githubusercontent.com/jeffreylancaster/game-of-
thrones/master/data/characters.json"
response = urlopen(URL)
data_json = json.loads(response.read())
characters = data_json['characters']
df = pd.DataFrame(characters)
df.to_csv("characters.csv")