我正在尝试以一种有效的方式创建基于1xN矩阵的矩阵,以便后来用作Scikit-Learn训练中的功能。到目前为止,我一直在尝试的许多事情之一是:
np.matrix(list(func(text) for text in data_test.data))
产生矩阵矩阵,这样的矩阵:
matrix([[ <1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 10921 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 17651 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 28180 stored elements in Compressed Sparse Row format>,...
显然,这并不是我真正想要的。我如何将其制成更合适的矩阵:
<76002x108800 sparse matrix of type '<type 'numpy.float64'>'
with 807960 stored elements in Compressed Sparse Row format>
怎么样
如果太慢了,请从这里走快路:https://github.com/scipy/scipy/scipy/blob/master/master/scipy/sparse/conscrats/conscrats.py#l396(在将来的Scipy版本中,vstack
本身将是在这种情况下快速)。