为什么 MFCC 提取库返回不同的值?

我正在使用两个不同的库提取MFCC特征：

python_speech_features库
鲍勃库

然而，两者的输出是不同的，甚至形状也不相同。这正常吗？还是缺少参数？

我的代码的相关部分如下：

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank
def bob_extract_features(audio, rate):
#get MFCC
rate              = 8000  # rate
win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
n_filters         = 26    # The number of filter bands
n_ceps            = 13    # The number of cepstral coefficients
f_min             = 0.    # The minimal frequency of the filter bank
f_max             = 4000. # The maximal frequency of the filter bank
delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
c.with_delta       = False
c.with_delta_delta = False
c.with_energy      = False
signal = np.cast['float'](audio)           # vector should be in **float**
example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
return  example_mfcc

def psf_extract_features(audio, rate):
signal = np.cast['float'](audio) #vector should be in **float**
mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
nfilt = 26, nfft = 512,appendEnergy = False)
#mfcc_feature = preprocessing.scale(mfcc_feature)
deltas       = delta(mfcc_feature, 2)
fbank_feat   = logfbank(audio, rate)
combined     = np.hstack((mfcc_feature, deltas))
return mfcc_feature

track = 'test-sample.wav'
rate, audio = read(track)
features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)
print("--------------------------------------------")
t = (features1 == features2)
print(t)

但是两者

的输出是不同的，甚至形状也不相同。这正常吗？

是的，算法有不同的种类，每个实现都选择自己的风格

还是缺少参数？

这不仅仅是关于参数，还有算法差异，如窗口形状(汉明与汉宁(、梅尔过滤器的形状、梅尔过滤器的开始、梅尔过滤器的规范化、提升、dct 风味等。

如果您想要相同的结果，只需使用单个库进行提取，则同步它们是非常没有希望的。

您是否尝试过将两者与宽容进行比较？我相信这两个 MFCC 是浮点数数组，测试精确相等可能并不明智。尝试使用具有一定容差的numpy.testing.assert_allclose，并确定公差是否足够好。

尽管如此，我还是错过了你说的，即使是形状也不匹配，而且我对 bob.ap 没有经验，无法自信地对此发表评论。但是，通常情况下，出于窗口原因，某些库在输入数组的开头或结尾用零填充输入，如果其中一个库以不同的方式执行此操作，则可能是原因。

相关内容

最新更新

热门标签：