我的问题类似于这个-解析JSON与数字为关键和这一个解析嵌套JSON除了我想解析我的JSON文件与pandas normalize。最终,我想把这个数据框写进一个excel文件(我有这样做的代码)。我想要在内存中的数据框应该像这样
Timestamp BatteryVoltage GridCurrent GridVoltage InverterCurrent InverterVoltage
....
....
我现有的代码对我一点帮助都没有-
import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize
with open('test.json') as data_file:
data = json.load(data_file)
df = pd.json_normalize(data['timestamp'])
我知道我必须给一些东西作为json_normalize record_path的参数,但我不确定它会是什么,因为时间戳的值不断变化。
{"timestamp": {
"1636987025": {
"batteryVoltage": 28.74732,
"gridCurrent": 3.68084,
"gridVoltage": 230.64401,
"inverterCurrent": 2.00471,
"inverterVoltage": 224.18573,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987008
},
"1636987085": {
"batteryVoltage": 28.52959,
"gridCurrent": 3.40046,
"gridVoltage": 230.41367,
"inverterCurrent": 1.76206,
"inverterVoltage": 225.24319,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987136
},
"1636987146": {
"batteryVoltage": 28.5338,
"gridCurrent": 3.37573,
"gridVoltage": 229.27209,
"inverterCurrent": 2.11128,
"inverterVoltage": 225.51733,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987136
},
"1636987206": {
"batteryVoltage": 28.55535,
"gridCurrent": 3.43365,
"gridVoltage": 229.47604,
"inverterCurrent": 1.98594,
"inverterVoltage": 225.83649,
"solarCurrent": 0,
"solarVoltage": 0,
"tValue": 1636987264
}
}
}
pandas的json_normalize是为json/dicts中嵌入列表而设计的;你的数据没有。
一个选项是jmespath
:
# pip install jmespath
expression = jmespath.compile("timestamp.*")
pd.DataFrame(expression.search(data))
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264
查看jmespath的文档;这是一个很棒的工具,可以方便地处理json。
另一个选项,没有jmespath,只有纯python:
result = [value for _, value in data['timestamp'].items()]
pd.DataFrame(result)
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264
捕获时间戳很容易:
# for python 3.9, you can use | instead for the dictionary merging
result = [{**value, **{'timestamp':key}} for key, value in data['timestamp'].items()]
pd.DataFrame(result)
batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue timestamp
0 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008 1636987025
1 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136 1636987085
2 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136 1636987146
3 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264 1636987206
另一种选择:
pd.DataFrame.from_dict(data['timestamp'], orient='index').rename_axis('timestamp').reset_index()
timestamp batteryVoltage gridCurrent gridVoltage inverterCurrent inverterVoltage solarCurrent solarVoltage tValue
0 1636987025 28.74732 3.68084 230.64401 2.00471 224.18573 0 0 1636987008
1 1636987085 28.52959 3.40046 230.41367 1.76206 225.24319 0 0 1636987136
2 1636987146 28.53380 3.37573 229.27209 2.11128 225.51733 0 0 1636987136
3 1636987206 28.55535 3.43365 229.47604 1.98594 225.83649 0 0 1636987264