断言错误:信号维度的格式应该是(N，)，但它是(743424，2)

对于我的ML项目，我使用一个模型，将视频和音频作为输入文件，以检测视频中的合成语音。

但它在audio_processing()函数上返回一个错误：

音频访问代码()

def audio_processing(wav_file, verbose=True):
rate, sig = wav.read(wav_file)
if verbose:
print("Sig length: {}, sample_rate: {}".format(len(sig), rate))
try:
mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
except IndexError:
raise ValueError("ERROR: Index error occurred while extracting mfcc")
if verbose:
print("mfcc_features shape:", mfcc_features.shape)
# Number of audio clips = len(mfcc_features) // length of each audio clip
number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS
if verbose:
print("Number of audio clips:", number_of_audio_clips)
# Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
# Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]
# Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)
if verbose:
print("Final mfcc_features shape:", mfcc_features.shape)
return mfcc_features

错误：

AssertionError: Signal dimention should be of the format of (N,) but it is (691200, 2) instead

File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:VU Final Project4 - Final DeliverableSynthetic-Speech-Detection-in-Videoapp.py", line 673, in modelprediction
audio_fea = audio_processing(audio, False)
File "D:VU Final Project4 - Final DeliverableSynthetic-Speech-Detection-in-Videoapp.py", line 49, in audio_processing
mfcc_features = speechpy.feature.mfcc(
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyfeature.py", line 139, in mfcc
feature, energy = mfe(signal, sampling_frequency=sampling_frequency,
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyfeature.py", line 185, in mfe
frames = processing.stack_frames(
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyprocessing.py", line 90, in stack_frames
assert sig.ndim == 1, s % str(sig.shape)

从外观上看，您的音频文件包含两个通道，您可以通过查看wav.read函数返回的数组的形状来检查：sig.shape。

speechpy.feature.mfcc功能需要单声道音频。我相信你可以做的是将你的音频转换为一个单一的通道，例如通过平均两个通道：

sig = np.mean(sig, axis=1)

如果你想让你的函数同时处理单通道和多通道数据，你只能在音频信号是多通道的情况下计算平均值：

if sig.ndim == 2:
sig = np.mean(sig, axis=1)

相关内容

最新更新

热门标签：