我正试图通过使用声音文件的MFCC从.wav文件中提取功能。当我试图将MFCC列表转换为numpy数组时,我遇到了一个错误。我很确定发生这个错误是因为列表中包含不同形状的MFCC值(但我不确定如何解决这个问题)。
我看了另外两篇stackoverflow帖子,但这些帖子并不能解决我的问题,因为它们对某项任务太具体了。
ValueError:无法将输入数组从形状(128128,3)广播到形状(128128)
值错误:无法将输入数组从形状(857,3)广播到形状(857)
完整错误消息:
Traceback(上次调用):文件"/…./…/…./Batch_MFCC_Data.py",第68行,inX=np。数组(MFCC)值错误:无法将输入数组从形状(20590)广播到形状(20)
代码示例:
all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)
MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels
for i, wav_path in enumerate(all_wav_paths):
individual_MFCC = MFCC_from_wav(wav_path)
#MFCC_from_wav() -> returns the MFCC coefficients
label = get_class(wav_path)
#get_class() -> returns the label of the wav file either 0 or 1
#add features and label to the array
MFCCs.append(individual_MFCC)
labels.append(label)
#Must convert the training data to a Numpy Array for
#train_test_split and saving to local drive
X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR
# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)
#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)
#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)
以下是MFCC阵列中MFCC的形状的快照(来自.wav文件)
MFCC阵列包含以下形状:
...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....
正如你所看到的,MFCC阵列中的MFCC并不都具有相同的形状,这是因为记录的时间长度并不都相同。这就是我无法将数组转换为numpy数组的原因吗?如果是这个问题,我该如何解决这个问题,使整个MFCC阵列具有相同的形状?
如果您能提供任何代码片段和建议,我们将不胜感激!
谢谢!
使用以下逻辑将阵列下采样到min_shape
,即将较大的阵列减少到min_shape
min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]
for idx, arr in enumerate(MFCCs):
MFCCs[idx] = arr[:, :min_shape[1]]
batch_arr = np.array(MFCCs)
然后,您可以将这些阵列堆叠在一个批处理阵列中,如下面的最小示例所示:
In [33]: a1 = np.random.randn(2, 3)
In [34]: a2 = np.random.randn(2, 5)
In [35]: a3 = np.random.randn(2, 10)
In [36]: MFCCs = [a1, a2, a3]
In [37]: min_shape = (2, 2)
In [38]: for idx, arr in enumerate(MFCCs):
...: MFCCs[idx] = arr[:, :min_shape[1]]
...:
In [42]: batch_arr = np.array(MFCCs)
In [43]: batch_arr.shape
Out[43]: (3, 2, 2)
现在对于第二种策略,要将较小数组的数组上采样到max_shape
,请遵循类似的逻辑,但根据需要用零或nan
值填充缺失的值。
然后,您可以将阵列堆叠为形状为(num_arrays, dim1, dim2)
的批处理阵列;因此,对于您的情况,形状应该是(num_wav_files, 20, max_column
)