我有一个数据,只有一个列具有以下结构:
Datetime stamp 1
Obs1
Obs2
Obs3
Datetime stamp 2
Obs1
Obs2
Obs3
我想像上面那样转换它。这样,日期时间将成为header,该特定日期时间的所有对象将成为该特定日期时间戳的行
Date time stamp 1. Date time stamp2
Obs1 Obs1
Obs2. obs2
Obs3. Obs3
假设您的单列存储在列表/数组中,您可以像这样创建子列表:
lst = ['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3', 'Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']
result = []
temp = [lst[0]]
for item in lst[1:]:
if item.startswith('Datetime'):
result.append(temp)
temp = [item]
else:
temp.append(item)
result.append(temp)
print(result)
输出:
[['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3'], ['Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']]
它现在是一个列表的列表,其中的每个元素都可以代表一个列。
假设格式始终相同(即所有分割都以字符串"Datetime
"),你可以得到指数与"Datetime"
字符串开始的地方,并选择所有数据之间分歧:
import pandas as pd
data = pd.Series(["Datetime stamp 1",
"Obs1",
"Obs2",
"Obs3",
"Datetime stamp 2",
"Obs1",
"Obs2",
"Obs3"])
#Get splits
idx_split =data.str.startswith("Datetime ")
idx_split = idx_split.index[idx_split] # [0,4]
N_COLS = len(idx_split) #number of columns
vals = [0]*N_COLS #Initialize values
#Loop over each split-index and slize data
for i in range(N_COLS-1):
vals[i] = list(data[idx_split[i]:idx_split[i+1]])
vals[i+1] = list(data[idx_split[-1]:]) #Get the last one
print(vals)
#[['Datetime stamp 1', 'Obs1', 'Obs2', 'Obs3'],
#['Datetime stamp 2', 'Obs1', 'Obs2', 'Obs3']]
#Get the first element from each list and use that as column name
# + remove it
cols = [p.pop(0) for p in vals]
#The data list is in wrong shape for pandas, use https://stackoverflow.com/questions/6473679/transpose-list-of-lists to transpose the list to right shape
df = pd.DataFrame(list(map(list, zip(*vals))),columns = cols)
print(df)
#Datetime stamp 1 Datetime stamp 2
#0 Obs1 Obs1
#1 Obs2 Obs2
#2 Obs3 Obs3