目标:

从作者样式生成文本。

输入:作者要训练的作品，用于预测的种子

输出:从该种子生成的文本

关于keras中嵌入层的问题:

我有原始文本，一个包含几千行文本的平面文本文件。我想把它输入到一个嵌入层中去对数据进行向量化。下面是我的文本:

--SNIP
The Wild  Westn Ha ha, riden All you see is the sun reflectin' off of the
--SNIP
and I call it input_text:
num_words = 2000#get 2000 words
tok = Tokenizer(num_words)#tokenize the words
tok.fit_on_texts(input_text)#takes in list of text to train on
#put all words from text into a words array
#this is essentially enumerating them
words = []
for iter in range(num_words):
    words += [key for key,value in tok.word_index.items() if value==iter+1]
#words[:10]
#Class for vectorizing texts, or/and turning texts into sequences 
#(=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).
X_train = tok.texts_to_sequences(input_text)#turns text to sequence, stating which word comes in what place
X_train = sequence.pad_sequences(X_train, maxlen=100)#pad sequence, essentially padding it with 0's at the end
y_train = words

问题:

似乎我的代码将在序列中，然后当我应用填充时，它只给出序列的前100。我该怎么把它拆开呢?

我是否应该取整个序列并通过前100个单词(X)，然后给出下一个单词(Y)并在此过程中进行一些跳过?

我希望输出是下一个单词出现的概率。最后我有一个softmax图层。本质上，我想从种子生成文本。这是正确的做法吗?或者只是更好

我想你不会在任何地方找到比这个页面更好的答案，顺便说一下，代码也可以在github上找到，潜入或提出更多问题

Keras -文本预处理

目标:

从作者样式生成文本。

关于keras中嵌入层的问题:

问题:

相关内容

最新更新

热门标签：