我试图对pandas数据帧进行某种程度的透视,但使用了一个"扭曲";我想?这就是加载到panda中的输入表(tsv(。
timestamp sensor type unit value
1607724078 sensor_a string gps coords1
1607724078 sensor_b int bar 1
1607724079 sensor_a string gps coords5
1607724079 sensor_b int bar 4
1607724080 sensor_a string gps coords9
1607724080 sensor_b int bar 7
输出应该像这个
timestamp sensor_a sensor_a_type sensor_a_unit sensor_b sensor_b_type sensor_b_unit
1607724078 coords1 string gps 1 int bar
1607724079 coords5 string gps 4 int bar
1607724080 coords9 string gps 7 int bar
因此,正如您所看到的,它应该按时间戳分组,传感器列的值必须定义为一个单独的列,该列的值来自原始值列。此外,原始类型和单位列必须是"0";名称合并";使用新的sensor_x列并保持它们的值。
我们的目标是把它保存为镶木地板。重要提示:可以有两个以上的传感器。
目前,我已经完成了创建一个包含我需要的所有信息的MultiIndexed数据帧,但我无法将其写入镶木地板,因为我想MultiIndex必须以某种方式扁平化,这样传感器单元和类型就有了自己的独立列。
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv("data/test.tsv", delimiter='t', index_col='timestamp')
df2 = df.pivot(columns=['sensor', 'unit', 'type'], values='value')
print(df2)
输出
sensor sensor_a sensor_b sensor_c
unit gps bar percent
type string int double
timestamp
1607724078 coords1 1 11.11
1607724079 coords5 4 44.44
1607724080 coords9 7 77.77
提前感谢!
以下是部分答案。。。只需要重命名一些列:
from io import StringIO
sList = '''timestamp sensor type unit value
1607724078 sensor_a string gps coords1
1607724078 sensor_b int bar 1
1607724079 sensor_a string gps coords5
1607724079 sensor_b int bar 4
1607724080 sensor_a string gps coords9
1607724080 sensor_b int bar 7'''.split('n')
s = 'n'.join([','.join(l.split()) for l in sList])
with StringIO(s) as sio:
df = pd.read_csv(sio)
df[df.sensor == 'sensor_a'].merge(df[df.sensor == 'sensor_b'], on='timestamp', suffixes=['_a','_b'])
输出
timestamp sensor_a type_a unit_a value_a sensor_b type_b unit_b value_b
0 1607724078 sensor_a string gps coords1 sensor_b int bar 1
1 1607724079 sensor_a string gps coords5 sensor_b int bar 4
2 1607724080 sensor_a string gps coords9 sensor_b int bar 7
根据此解决方案,您可以仅用顶级替换列
df2.columns = df2.columns.get_level_values(0)
这应该会产生类似的结果
sensor_a sensor_b sensor_c
1607724078 coords1 1 11.11
1607724079 coords5 4 44.44
1607724080 coords9 7 77.77
您可以使用set_index和unstack来重塑数据帧,然后使用以下列表理解来压平多索引:
import pandas as pd
from io import StringIO
#Input file
sList = '''timestamp sensor type unit value
1607724078 sensor_a string gps coords1
1607724078 sensor_b int bar 1
1607724079 sensor_a string gps coords5
1607724079 sensor_b int bar 4
1607724080 sensor_a string gps coords9
1607724080 sensor_b int bar 7'''
#Read file in as dataframe
df = pd.read_csv(StringIO(sList), sep='ss+', engine='python')
#Reshape the dataframe
df_out = df.set_index(['timestamp', 'sensor']).unstack()
#Flatten multiindex
df_out.columns = [f'{j}_{i}' if i != 'value' else f'{j}' for i, j in df_out.columns]
df_out = df_out.reset_index()
print(df_out)
输出:
timestamp sensor_a_type sensor_b_type sensor_a_unit sensor_b_unit sensor_a sensor_b
0 1607724078 string int gps bar coords1 1
1 1607724079 string int gps bar coords5 4
2 1607724080 string int gps bar coords9 7