我很想知道如何使用Pandas优雅地将以下格式的单列文件拆分为更经典的表格布局。
(文件作为眼动仪的输出接收)
当前格式:
TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
00000000.11111111111111.22222222222222.33333333333333.4444444444444
所需格式:
TimeStamp GazePointXLeft GazePointYLeft GazePointXRight GazePointYRight
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
000000000 11111111111111 22222222222222 333333333333333 444444444444444
我被卡住的地方:我想解决方案将涉及熊猫的split
方法,但我很难弄清楚如何到达那里。我想我得"手工"了。添加相应的列,同时以某种方式分割以句号分隔的数据行…
df = pd.DataFrame('data.csv')
headers = ["TimeStamp", ..., "GazePointYRight"]
for header in headers:
df[header] = df[1:].split(".")[headers.index(header)] <--- # Splitting rows by period and taking data based on header index in list
请给我指路。提前谢谢。
pandas.read_...
有几个有用的参数可供使用。
我相信你想要这样的东西?
import pandas as pd
columns_names = [
'TimeStamp',
'GazePointXLeft',
'GazePointYLeft',
'GazePointXRight',
'GazePointYRight',
]
df = pd.read_csv("lixo.csv", sep='.', skiprows=1, names=columns_names)
最好在读取csv时修复:
headers = ["TimeStamp", ..., "GazePointYRight"]
df = pd.read_csv('data.csv', sep='.', skiprows=1, names=headers)
之后也可以这样做:
df = pd.read_csv('data.csv')
headers = ["TimeStamp", ..., "GazePointYRight"]
df = df.TimeStampGazePointXLeftGazePointYLeftGazePointXRightGazePointYRight.str.split('.', expand=True)
df.rename(columns={n:name for n, name in enumerate(headers)}, inplace=True)