无法在透视熊猫数据帧上调用to_json:返回"ValueError: Label array sizes do not match corresponding data shape"



我有一个数据库,其中包含来自传感器的时间序列数据。我想在前端使用的图形库要求将数据重塑为每个传感器一列,而不是我的数据集中的垂直格式:

>>> for d in dataset: print d                                                   
...
[datetime.datetime(2014, 9, 26, 0, 56, 0, 598000), u'motion', 0.0]
[datetime.datetime(2014, 9, 26, 0, 56, 7, 698000), u'motion', 1.0]
[datetime.datetime(2014, 9, 26, 0, 58, 20, 298000), u'motion', 0.0]
[datetime.datetime(2014, 9, 26, 2, 21, 27, 893000), u'door', 0.0]
[datetime.datetime(2014, 9, 26, 2, 21, 37, 793000), u'door', 1.0]
[datetime.datetime(2014, 9, 26, 2, 21, 53, 893000), u'door', 0.0]

在stackoverflow和pandas文档的帮助下(谢谢!)我找到了如何调整数据的方法:

>>> import pandas as pd
>>> pd.__version__
'0.14.1'
>>>
>>> df = pd.DataFrame(dataset, columns=['tstamp', 'tag', 'value'])              
>>> dfp = df.pivot('tstamp', 'tag')
>>> dfp
                                 value
tag                         door  motion
tstamp
2014-09-26 00:56:00.598000         NaN            0
2014-09-26 00:56:07.698000         NaN            1
2014-09-26 00:58:20.298000         NaN            0
2014-09-26 02:21:27.893000           0          NaN
2014-09-26 02:21:37.793000           1          NaN
2014-09-26 02:21:53.893000           0          NaN
>>>

现在我一直在尝试用JSON:输出数据

>>> dfp.to_json()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/core/generic.py", line 853, in to_json
    default_handler=default_handler)
  File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/io/json.py", line 34, in to_json
    date_unit=date_unit, default_handler=default_handler).write()
  File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/io/json.py", line 77, in write
    default_handler=self.default_handler)
ValueError: Label array sizes do not match corresponding data shape

我是熊猫的新手,所以我想我需要修复我的"标签阵列"。我该怎么办?我可以看到

>>> dfp.keys()
MultiIndex(levels=[[u'value'], [u'door', u'motion']],
           labels=[[0, 0], [0, 1]],
           names=[None, u'tag'])

但我不知道下一步该怎么办。

pivot正在生成一个DataFrame,其列具有MultiIndex。由于顶层value对于所有列都是相同的,您可以简单地删除它:

dfp.columns = dfp.columns.droplevel(0)

然后调用to_json作品:

In [20]: dfp.to_json()
Out[20]: '{"door":{"1411692960598":null,"1411692967698":null,"1411693100298":null,"1411698087893":0.0,"1411698097793":1.0,"1411698113893":0.0},"motion":{"1411692960598":0.0,"1411692967698":1.0,"1411693100298":0.0,"1411698087893":null,"1411698097793":null,"1411698113893":null}}'

或者,更好的是,在调用pivot:时指定values

In [26]: dfp = df.pivot(index='tstamp', columns='tag', values='value'); dfp
Out[26]: 
tag                         door  motion
tstamp                                  
2014-09-26 00:56:00.598000   NaN       0
2014-09-26 00:56:07.698000   NaN       1
2014-09-26 00:58:20.298000   NaN       0
2014-09-26 02:21:27.893000     0     NaN
2014-09-26 02:21:37.793000     1     NaN
2014-09-26 02:21:53.893000     0     NaN

现在调用to_json是开箱即用的,因为列索引是平面的。

最新更新