在sktime 中学习分类
from sklearn.model_selection import train_test_split
X = AUDCHF_h1_model[['Open','High','Low','Close','Volume','VWMA',
'Minute','Hour','Day','Week','Month','Year']].values
y = AUDCHF_h1_model[['is_beg_leg']].values
X_train,X_test,y_train,y_test = train_test_split(
X, y, test_size=0.2)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(53250,12((53250、1((13313、12((13313,1(
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.interval_based import TimeSeriesForestClassifier
#from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
我收到
ValueError:案例数不匹配。X中的数量=639000个,y=53250
但
X_train.shape(53250,12(y_train.shape(53250,1(
谁知道呢?
根据您提供的信息,我不能肯定地说什么,但我怀疑问题出在您的管道中的ColumnConcatenator
,它堆叠了X
的所有列,以创建一个53250*12=639000行的新的单变量时间序列。然后,这个串联的时间序列被传递到TimeSeriesForestClassifier
,并且具有与原始输入不同的形状。根据您的用例,您现在可以删除";级联的";步骤,否则您必须为新创建的单变量时间序列提供目标值。