输入..并且每个元素都必须是连续的 错误使用 sklearn MultinominalHMM



尝试在sklearn中创建一个左右离散的HMM来识别来自识别字符的单词。 符号集是所有 " " + 26 个字母,总共 27 个符号。

import numpy as np
from sklearn import hmm
# alphabet is symbols
symbols = [' ','a','b','c','d','e','f','g','h','i','j',  #0-10
           'k','l','m','n','o','p','q','r','s','t',  #11-20
           'u','v','w','x','y','z']                  #21-26
num_symbols = len(symbols)
# words up to 6 letters
n_states = 6   
obsONE = np.array([ [0,0,15,14,5,0],     # __one_
                [15,14,5,0,0,0],     # one___
                [0,0,0,15,14,5],     # ___one
                [0,15,14,5,0,0],     # _one__
                [0,0,16,14,5,0],     # __pne_
                [15,14,3,0,0,0],     # onc___
                [0,0,0,15,13,5],     # ___ome
                [0,15,14,5,0,0],     # _one__
                [0,0,15,14,5,0],     # __one_
                [15,14,5,0,10,15],   # one_jo
                [1,14,0,15,14,5],     # an_one
                [0,15,14,5,0,16],     # _one_p
                [20,0,15,14,5,0],     # t_one_
                [15,14,5,0,10,15],     # one_jo
                [21,20,0,15,14,5],     # ut_one
                [0,15,14,5,0,20],     # _one_t
                [21,0,15,14,5,0],     # u_one_
                [15,14,5,0,10,15],     # one_jo
                [0,0,0,15,14,5],     # an_one
                [0,15,14,5,0,26],     # _one_z
                [5,20,0,15,14,5] ])    

pi = np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0])  # initial state is the left one always
A = np.array([[0.0, 1.0, 0.0, 0.0, 0.0, 0.0],  # node 1 goes to node 2
          [0.0, 0.5, 0.5, 0.0, 0.0, 0.0],  # node 2 can self loop or goto 3
          [0.0, 0.0, 0.5, 0.5, 0.0, 0.0],  # node 3 can self loop or goto 4
          [0.0, 0.0, 0.0, 0.5, 0.5, 0.0],  # node 4 can self loop or goto 5
          [0.0, 0.0, 0.0, 0.0, 0.5, 0.5],  # node 5 can self loop or goto 6
          [1.0, 0.0, 0.0, 0.0, 0.0, 0.0]])  # node 6 goes to node 1     
model = hmm.MultinomialHMM(n_components=n_states,
                       startprob=pi,          # this is the start matrix, pi
                       transmat=A,            # this is the transition matrix, A
                       params='e',            # update e in during training (aka B)
                       init_params='ste')     # initialize with s,t,e   
model.n_symbols = num_symbols
model.fit(obsONE)  

但是我得到 ValueError:输入必须既是正整数数组,又必须是连续的。

代码似乎直接希望将观察实现为 [0,1,2,3,4,5]

应该如何设置它才能获得我想要的 HMM 模型???

我遇到了同样的问题。在我看来,输入的观察序列不包含词汇表中的某些字符。因此,您可以在找出观察序列中存在哪些字符后为字符分配数字,而不是静态地为它们分配数字。

(例如。假设单词 = 'apzaqb'

符号 = ['a','b','p','

q','z'] 用于编号

(即)ObsOne = np.array([0,2,4,0,3,1])

而不是

符号 = ['','a','b','c','d','e','f','g','h','i','j','k','l','m','n',o','p','q','r',s't','u','v','w','x',

'y','z'] 用于编号

(即)ObsOne = np.array([1,16,26,1,17,2])

相关内容

  • 没有找到相关文章

最新更新