我有一个数据库,其中包含来自传感器的时间序列数据。我想在前端使用的图形库要求将数据重塑为每个传感器一列,而不是我的数据集中的垂直格式:
>>> for d in dataset: print d
...
[datetime.datetime(2014, 9, 26, 0, 56, 0, 598000), u'motion', 0.0]
[datetime.datetime(2014, 9, 26, 0, 56, 7, 698000), u'motion', 1.0]
[datetime.datetime(2014, 9, 26, 0, 58, 20, 298000), u'motion', 0.0]
[datetime.datetime(2014, 9, 26, 2, 21, 27, 893000), u'door', 0.0]
[datetime.datetime(2014, 9, 26, 2, 21, 37, 793000), u'door', 1.0]
[datetime.datetime(2014, 9, 26, 2, 21, 53, 893000), u'door', 0.0]
在stackoverflow和pandas文档的帮助下(谢谢!)我找到了如何调整数据的方法:
>>> import pandas as pd
>>> pd.__version__
'0.14.1'
>>>
>>> df = pd.DataFrame(dataset, columns=['tstamp', 'tag', 'value'])
>>> dfp = df.pivot('tstamp', 'tag')
>>> dfp
value
tag door motion
tstamp
2014-09-26 00:56:00.598000 NaN 0
2014-09-26 00:56:07.698000 NaN 1
2014-09-26 00:58:20.298000 NaN 0
2014-09-26 02:21:27.893000 0 NaN
2014-09-26 02:21:37.793000 1 NaN
2014-09-26 02:21:53.893000 0 NaN
>>>
现在我一直在尝试用JSON:输出数据
>>> dfp.to_json()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/core/generic.py", line 853, in to_json
default_handler=default_handler)
File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/io/json.py", line 34, in to_json
date_unit=date_unit, default_handler=default_handler).write()
File "/var/www/environment/default/local/lib/python2.7/site-packages/pandas/io/json.py", line 77, in write
default_handler=self.default_handler)
ValueError: Label array sizes do not match corresponding data shape
我是熊猫的新手,所以我想我需要修复我的"标签阵列"。我该怎么办?我可以看到
>>> dfp.keys()
MultiIndex(levels=[[u'value'], [u'door', u'motion']],
labels=[[0, 0], [0, 1]],
names=[None, u'tag'])
但我不知道下一步该怎么办。
pivot
正在生成一个DataFrame,其列具有MultiIndex。由于顶层value
对于所有列都是相同的,您可以简单地删除它:
dfp.columns = dfp.columns.droplevel(0)
然后调用to_json
作品:
In [20]: dfp.to_json()
Out[20]: '{"door":{"1411692960598":null,"1411692967698":null,"1411693100298":null,"1411698087893":0.0,"1411698097793":1.0,"1411698113893":0.0},"motion":{"1411692960598":0.0,"1411692967698":1.0,"1411693100298":0.0,"1411698087893":null,"1411698097793":null,"1411698113893":null}}'
或者,更好的是,在调用pivot
:时指定values
列
In [26]: dfp = df.pivot(index='tstamp', columns='tag', values='value'); dfp
Out[26]:
tag door motion
tstamp
2014-09-26 00:56:00.598000 NaN 0
2014-09-26 00:56:07.698000 NaN 1
2014-09-26 00:58:20.298000 NaN 0
2014-09-26 02:21:27.893000 0 NaN
2014-09-26 02:21:37.793000 1 NaN
2014-09-26 02:21:53.893000 0 NaN
现在调用to_json
是开箱即用的,因为列索引是平面的。