合并两个 2D 数组时出错无法连接零维数组

我正在处理二进制文本分类任务，我已经对我的数据应用了矢量化器，如下所示：

count_vect = CountVectorizer(tokenizer=tokens)
X_train_counts = count_vect.fit_transform(docs_train.data)
print X_train_counts.shape
(150, 370)

并且因为我只想从类"0"（在我的示例中为 a）中随机抽取一个样本并将其分类为类"1"，所以我做了以下操作：

x =  X_train_counts
y =  docs_train.target
a_x,a_y=x[y==0,:],y[y==0]   
b_x,b_y=x[y==1,:],y[y==1]
inds=np.random.choice(range(a_x.shape[0]),50)
random_x=a_x[inds,:]
random_y=a_y[inds]
x_merged=np.concatenate((random_x,b_x))
y_merged=np.concatenate((random_y,b_y))
X_train,y_train=shuffle(x_merged, y_merged, random_state=0)

但我总是收到以下错误：

x_merged=np.concatenate((random_x,b_x))
ValueError: zero-dimensional arrays cannot be concatenated

虽然当我打印形状时它给了我：

print random_x.shape
print b_x.shape
(50, 370)
(50, 370)

知道如何修复它吗？当然，保留索引，因为它链接到标签。

更新：这是执行以下命令时每个数组的内容/类型的打印：

print random_x[:5,:].toarray()
print b_x[:5,:].toarray()
print (type(random_x))
print (type(b_x))
[[0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [4 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]
[[0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]
<class 'scipy.sparse.csr.csr_matrix'>
<class 'scipy.sparse.csr.csr_matrix'>

编辑：显然Scipy有自己的连接方法，包括hstack和vstack，它们处理稀疏矩阵。

问题确实出在类型上。要解决它，只需将您的csr_matrix转换为数组，连接，然后再次将其转换为csr_matrix：

     import numpy as np
     import scipy.sparse as m
     a = np.zeros((50, 370))
     b = np.zeros((50, 370))
     am = m.csr_matrix(a).toarray()
     bm = m.csr_matrix(b).toarray()
     cm = m.csr_matrix(np.concatenate((am,bm)))
     print(am.shape,bm.shape,cm.shape)

结果是：

     (50, 370) (50, 370) (100, 370)

相关内容

最新更新

热门标签：

合并两个 2D 数组时出错 无法连接零维数组

相关内容

最新更新

热门标签：

合并两个 2D 数组时出错无法连接零维数组