对于我的ML项目,我使用一个模型,将视频和音频作为输入文件,以检测视频中的合成语音。
但它在audio_processing()函数上返回一个错误:
音频访问代码()
def audio_processing(wav_file, verbose=True):
rate, sig = wav.read(wav_file)
if verbose:
print("Sig length: {}, sample_rate: {}".format(len(sig), rate))
try:
mfcc_features = speechpy.feature.mfcc(sig, sampling_frequency=rate, frame_length=0.010, frame_stride=0.010)
except IndexError:
raise ValueError("ERROR: Index error occurred while extracting mfcc")
if verbose:
print("mfcc_features shape:", mfcc_features.shape)
# Number of audio clips = len(mfcc_features) // length of each audio clip
number_of_audio_clips = len(mfcc_features) // AUDIO_TIME_STEPS
if verbose:
print("Number of audio clips:", number_of_audio_clips)
# Don't consider the first MFCC feature, only consider the next 12 (Checked in syncnet_demo.m)
# Also, only consider AUDIO_TIME_STEPS*number_of_audio_clips features
mfcc_features = mfcc_features[:AUDIO_TIME_STEPS*number_of_audio_clips, 1:]
# Reshape mfcc_features from (x, 12) to (x//20, 12, 20, 1)
mfcc_features = np.expand_dims(np.transpose(np.split(mfcc_features, number_of_audio_clips), (0, 2, 1)), axis=-1)
if verbose:
print("Final mfcc_features shape:", mfcc_features.shape)
return mfcc_features
错误:
AssertionError: Signal dimention should be of the format of (N,) but it is (691200, 2) instead
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2548, in __call__
return self.wsgi_app(environ, start_response)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2528, in wsgi_app
response = self.handle_exception(e)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesflaskapp.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:VU Final Project4 - Final DeliverableSynthetic-Speech-Detection-in-Videoapp.py", line 673, in modelprediction
audio_fea = audio_processing(audio, False)
File "D:VU Final Project4 - Final DeliverableSynthetic-Speech-Detection-in-Videoapp.py", line 49, in audio_processing
mfcc_features = speechpy.feature.mfcc(
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyfeature.py", line 139, in mfcc
feature, energy = mfe(signal, sampling_frequency=sampling_frequency,
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyfeature.py", line 185, in mfe
frames = processing.stack_frames(
File "C:UsersDELLAppDataRoamingPythonPython39site-packagesspeechpyprocessing.py", line 90, in stack_frames
assert sig.ndim == 1, s % str(sig.shape)
从外观上看,您的音频文件包含两个通道,您可以通过查看wav.read
函数返回的数组的形状来检查:sig.shape
。
speechpy.feature.mfcc
功能需要单声道音频。我相信你可以做的是将你的音频转换为一个单一的通道,例如通过平均两个通道:
sig = np.mean(sig, axis=1)
如果你想让你的函数同时处理单通道和多通道数据,你只能在音频信号是多通道的情况下计算平均值:
if sig.ndim == 2:
sig = np.mean(sig, axis=1)