如何使某些训练样本在keras tenserflow中更重要?



在TensorFlow Keras中,很容易使某些训练样本更重要。类DataGenerator(tf.keras.utils.Sequence)的正常输出是(X,y)。取而代之的是输出(X,y,w),其中权重与y的形状相同。然后使所有正目标的w=2,所有负目标的w=1。然后使用通常的TensorFlow Keras调用t_gen = DataGenerator() model.fit(t_gen)进行训练

NB2: I am working with LSTM

我认为通过" more important ";你的意思是x=1的样本比x不=1的样本对成本函数的影响更大。模型中有两个参数。适合可以让你这样做,class_weight或sample_weight。这里的文档描述如下:

class_weight: Optional dictionary mapping class indices (integers) to a weight
(float) value, used for weighting the loss function (during training only). This
can be useful to tell the model to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for
weighting the loss function (during training only). You can either pass a flat (1D)
Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape
(samples, sequence_length), to apply a different weight to every timestep of every
sample. This argument is not supported when x is a dataset, generator, or 
keras.utils.Sequence instance, instead provide the sample_weights as the third 
element of x.

为了获得您希望使用sample_weight的结果,您将不得不创建一个生成器,生成返回3个值的数据批次,x, y, w,其中x是样本数组,y是标签数组,w是样本权重数组。在你的例子中,你可能想让所有x值不为1的样本的w= 1,让所有x值为1的样本的w=2。这使得x=1的样本的影响是成本函数的两倍。如何构建一个自定义信息发生器坐落在这里。在代码中,您需要为每个样本确定w的值,并返回一个w数组以及x和y。如果x是数据中的一个类,一个更简单的替代方法可能是使用class_weight。例如,假设您有一个数据集,格式为:

No of Samples    Class  Class Index
100            A        0
200            B        1
1700           C        2


class_weight={0:17, 1:8.5, 2:1}
