构建一个编码器-解码器

我是深度学习领域的新手，我正在尝试构建一个编码器。

trainFromTextFile = "train.FROM"
trainToTextFile   = "train.TO"
trainFromText     = open(trainFromTextFile, 'r', encoding='utf-8').read().lower()
trainToText       = open(trainToTextFile, 'r', encoding='utf-8').read().lower()
trainFromSentence = re.split('n', trainFromText)
trainToSentence   = re.split('n', trainToText)
trainFromWords = re.split(' |n', trainFromText)
trainToWords   = re.split(' |n', trainToText)
print('Found %s sentences from TrainFrom Text' %len(trainFromSentence))
print('Found %s sentences from TrainTo Text' %len(trainToSentence))
print('Found %s words from TrainFrom Text' %len(trainFromWords))
print('Found %s words from TrainTo Text' %len(trainToWords))
trainInput = trainFromSentence[0:1000]
trainTarget = trainToSentence[0:1000]
max_len = 100    # Cut comments after 100 words
max_words = 10000  # Consider the top 10,000 words in the dataset
tokenizerInput = Tokenizer(num_words=max_words)
tokenizerInput.fit_on_texts(trainInput)
wordInput = tokenizerInput.text_to_word_sequence(trainInput)
sequencesInput = tokenizerInput.texts_to_sequences(trainInput)
sequencesInput = pad_sequences(sequencesInput, maxlen=max_len)  #Pad so all the arrays are the same size
Inputindex = tokenizerInput.word_index
Inputcount = tokenizerInput.word_counts
nInput = len(tokenizerInput.word_counts) + 1
print("Train From File:n")
print('Found %s sentences.' %len(trainInput))
print('Found %s sequences.' %len(sequencesInput))
print('Found %s unique tokens.' % len(Inputindex))
print('Found %s unique words.' % len(Inputcount))

这就是我目前所掌握的，我想知道如何使用手头的数据，并构建一个编码器来接收这些数据。

这通常是构建不同类型的自动编码器链接的方法。但从你的问题来看，你似乎对使用编码器-解码器类型模型的序列间预测感兴趣，该模型主要基于递归神经网络。可以在这里找到的教程链接

相关内容

最新更新

热门标签：