对xy空间坐标的时间序列进行深度学习分类- python



我在DL分类问题上遇到了一些问题。我将附上一个训练数据的简短示例来帮助描述这个问题。

数据是xy点的时间序列,由更小的子序列event组成。所以每个唯一的event是独立的。我有两个唯一的序列(10,20)下面的偶数时间长度。对于给定的序列,每个单独的点都有自己唯一的标识符user_id。在给定的序列中,这些点的xy轨迹会略有变化,在interval中可以找到特定的时间段。我也有一个单独的xy点用作参考(centre_x, center_y),它详细说明了所有点的大约中间/中心。

最后,target_label对这些点的相对位置进行分类。所以以centre_x, center_y为参照,有5个类Middle, Top, Bottom, Right, Left。每个唯一的event只能有一个标签。

问题:

  1. 显然数据集很小,但我关心的是准确性。我想我需要合并参考点(centre_x, center_y)

  2. 我为每个测试迭代得到所有这些警告。我认为它与转换成张量有关但它没有任何帮助

    警告:tensorflow:最后7次调用引发了特遣部队。追溯功能。跟踪是昂贵的,过多的跟踪可能是由于(1)创建@tf。函数在循环中重复,(2)传递不同形状的张量,(3)传递Python对象而不是张量。对于(1),请定义您的@tf。循环外的函数。对于(2),@tf。函数具有experimental_relax_shapes=True选项,可以放松参数形状,从而避免不必要的回溯。(3)详情请参阅https://www.tensorflow.org/guide/function#controlling_retracing和https://www.tensorflow.org/api_docs/python/tf/function。

df例子:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# number of intervals
n = 10
# center locations for points
locs_1 = {'A': (5,5),
'B': (5,8),
'C': (5,2),
'D': (8,5)}
# initialize data 
data_1 = pd.DataFrame(index=range(n*len(locs_1)), columns=['x','y','user_id'])
for i, group in enumerate(locs_1.keys()):
data_1.loc[i*n:((i+1)*n)-1,['x','y']] = np.random.normal(locs_1[group], 
[0.2,0.2], 
[n,2]) 
data_1.loc[i*n:((i+1)*n)-1,['user_id']] = group
# generate time interavls
data_1['interval'] = data_1.groupby('user_id').cumcount() + 1
# assign unique string to differentiate sequences
data_1['event'] = 10
# center of all points for unqiue sequence 1
data_1['center_x'] = 5
data_1['center_y'] = 5
# classify labels
data_1['target_label'] = ['Middle' if ele  == 'A' else 'Top' if ele == 'B' else 'Bottom' if ele == 'C' else 'Right' for ele in data_1['user_id']]
# center locations for points
locs_2 = {'A': (14,15),
'B': (16,15),
'C': (15,12),
'D': (19,15)}
# initialize data 
data_2 = pd.DataFrame(index=range(n*len(locs_2)), columns=['x','y','user_id'])
for i, group in enumerate(locs_2.keys()):
data_2.loc[i*n:((i+1)*n)-1,['x','y']] = np.random.normal(locs_2[group], 
[0.2,0.2], 
[n,2]) 
data_2.loc[i*n:((i+1)*n)-1,['user_id']] = group
# generate time interavls
data_2['interval'] = data_2.groupby('user_id').cumcount() + 1
# center of points for unqiue sequence 1
data_2['event'] = 20
# center of all points for unqiue sequence 2
data_2['center_x'] = 15
data_2['center_y'] = 15
# classify labels
data_2['target_label'] = ['Middle' if ele  == 'A' else 'Middle' if ele == 'B' else 'Bottom' if ele == 'C' else 'Right' for ele in data_2['user_id']]
df = pd.concat([data_1, data_2])
df = df.sort_values(by = ['event','interval','user_id']).reset_index(drop = True)
p>
x          y user_id  interval  event  center_x  center_y target_label
0    5.288275   5.211246       A         1     10         5         5       Middle
1    4.765987   8.200895       B         1     10         5         5          Top
2    4.943518   1.645249       C         1     10         5         5       Bottom
3    7.930763   4.965233       D         1     10         5         5        Right
4    4.866746   4.980674       A         2     10         5         5       Middle
..        ...        ...     ...       ...    ...       ...       ...          ...
75  18.929254  15.297437       D         9     20        15        15        Right
76  13.701538  15.049276       A        10     20        15        15       Middle
77  16.028816  14.985672       B        10     20        15        15       Middle
78  15.044336  11.631358       C        10     20        15        15       Bottom
79   18.95508  15.217064       D        10     20        15        15        Right

模型:

labels = df['target_label'].dropna().sort_values().unique()
n_samples = df.groupby(['user_id', 'event']).ngroups
n_ints = 10
X = df[['x','y']].values.reshape(n_samples, n_ints, 2).astype('float32')
y = df.drop_duplicates(subset = ['event','user_id','target_label'])
y = np.array(y['target_label'].groupby(level = 0).apply(lambda x: [x.values[0]]).tolist())
y = label_binarize(y, classes = labels)
# test, train split
trainX, testX, trainy, testy = train_test_split(X, y, test_size = 0.2)
# load the dataset, returns train and test X and y elements
def load_dataset():
# test, train split
trainX, testX, trainy, testy = train_test_split(X, y, test_size = 0.2)
return trainX, trainy, testX, testy
# fit and evaluate a model
def evaluate_model(trainX, trainy, testX, testy):
verbose, epochs, batch_size = 0, 10, 32
n_timesteps, n_features, n_outputs = trainX.shape[1], trainX.shape[2], trainy.shape[1]
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
# evaluate model
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
return accuracy
# summarize scores
def summarize_results(scores):
print(scores)
m, s = np.mean(scores), np.std(scores)
print('Accuracy: %.3f%% (+/-%.3f)' % (m, s))
# run an experiment
def run_experiment(repeats=10):
# load data
trainX, trainy, testX, testy = load_dataset()
# repeat experiment
scores = list()
for r in range(repeats):
#r = tf.convert_to_tensor(r, dtype=tf.int32)
score = evaluate_model(trainX, trainy, testX, testy)
score = score * 100.0
print('>#%d: %.3f' % (r+1, score))
scores.append(score)
# summarize results
summarize_results(scores)
# run the experiment
run_experiment()

您正在尝试使用长度为10的2d时间序列进行时间序列分类。似乎每个类只有很少的例子,这对于神经网络的训练来说太少了。即使你有一百个例子,我也会建议你使用一种能够处理更少数据的方法。一个例子是使用k近邻,使用时间序列特定的距离度量,如动态时间翘曲。

相关内容

  • 没有找到相关文章

最新更新