使用SMOTE来处理不平衡的三维阵列数据



我有下面的数据,这里是类的分布。

X shape == (477324, 5, 11)
Y shape == (477324,)
{0: 11986, 1: 465338}

由于我的数据集是不平衡的,我尝试使用以下代码进行RandomOverSampling。

from imblearn.over_sampling import RandomOverSampler
oversample = RandomOverSampler(sampling_strategy='minority')
oversample.fit_resample(trainX[:,:,0], trainY)
Xo = trainX[oversample.sample_indices_]
yo = trainY[oversample.sample_indices_]
Xo shape == (930676, 5, 11).
yo shape == (930676,).
{0: 465338, 1: 465338}

但是,如何在同一个上使用SMOTE而不是RandomOverSampler?我尝试了下面的代码来应用SMOTE,并将其重塑为三维数组,因为我也需要重新采样后的三维数组。

Xo_smote,yo_smote = oversample_Smote.fit_resample(trainX[:,:,0], trainY)
Xo shape == (930676, 5).
yo shape == (930676,).
org_shape= trainX.shape
Xo = np.reshape(Xo, org_shape)

我收到错误"ValueError: cannot reshape array of size 51187180 into shape (477324,5,11)"。有什么建议吗。

我认为您正在尝试将过采样的数组重新整形为原始形状,由于过采样,该数组现在比原始数组大。

这里有一个关于如何使用给定形状的最小工作示例:

from imblearn.over_sampling import SMOTE
import numpy as np
train_features = np.random.rand(477324, 5, 11)
train_labels = np.array([0] * 11986 + [1] * 465338)
np.random.shuffle(train_labels)
print(train_features.shape, train_labels.shape)  # (477324, 5, 11) (477324,)
train_features_shape = train_features.shape
train_features = train_features.reshape(train_features.shape[0], train_features.shape[1]*train_features.shape[2])
print(train_features.shape, train_labels.shape)  # (477324, 55) (477324,)
sm = SMOTE(random_state=69)
train_features, train_labels = sm.fit_resample(train_features, train_labels)
print(train_features.shape, train_labels.shape)  # (930676, 55) (930676,)
train_features = train_features.reshape(train_features.shape[0], train_features_shape[1], train_features_shape[2])
print(train_features.shape, train_labels.shape)  # (930676, 5, 11) (930676,)

最新更新