使用去噪自动编码器重建原始数据

有时，原始数据没有包含足够的信息，比如生物实验数据。我有一个大小为100*1000的基因表达数据集。我想使用去噪自动编码器来获得相同大小（100*1000）的重建输出。这怎么可能呢？

在这里您可以找到一篇关于自动编码器的有趣文章。还提到了降级案例——我希望它能回答你的问题：

https://medium.com/a-year-of-artificial-intelligence/lenny-2-autoencoders-and-word-embeddings-oh-my-576403b0113a#.2jdcn3ctk

如果有人偶然发现这篇文章，想知道如何编写去噪自动编码器。这里有一个简单的例子：

import numpy as np
import tensorflow as tf
# Generate a 100x1000 dataset
x_train = np.random.rand(100, 1000)
# Add noise to the data
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
# Clip the values to [0, 1]
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
# Define the input layer
inputs = tf.keras.layers.Input(shape=(1000,))
# Define the encoder
encoded = tf.keras.layers.Dense(100, activation='relu')(inputs)
# Define the decoder
decoded = tf.keras.layers.Dense(1000, activation='sigmoid')(encoded)
# Define the autoencoder model
autoencoder = tf.keras.models.Model(inputs, decoded)
# Compile the model
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
# Train the model
autoencoder.fit(x_train_noisy, x_train, epochs=100, batch_size=32)

注：

你必须用你的数据替换x_train
x_train必须是无噪声的（否则去噪自动编码器将无法工作，因为它没有参考）
您可以为编码器和解码器部分添加额外的层
您应该使用超参数（各个层中的神经元数量、损失函数、（优化器）时期、batch_size），看看什么最适合您->最好运行优化器来为它们找到最佳值（如网格搜索等）

这里有几个链接到自动编码器上的其他来源：

机器学习掌握

Keras博客

相关内容

最新更新

热门标签：