使用pygames播放.wav文件,但发出的声音/jibberish



我正在尝试制作一个语音助手,该语音助手由Google Cloud TTS使用。我看了youtube视频上的每一步,我似乎是唯一一个有这个问题的人。但是每次我跑步时,音频输出/声音都是吱吱或超高的音调。

我尝试过其他模块,如pyAudio, sounddevice, pydub等。但运气不好。我也试过调整音高、频率、速率等。但没有什么能消除它发出的尖细的声音。我希望它像视频一样,因为所有的评论都是其他人做的没有问题。如有任何帮助,不胜感激

** wav文件是24000 hz,生成的文件听起来正确。但是当通过pygames处理时,似乎没有

def unique_languages_from_voices(voices):
language_set = set()
for voice in voices:
for language_code in voice.language_codes:
language_set.add(language_code)
return language_set

def list_languages():
client = tts.TextToSpeechClient()
response = client.list_voices()
languages = unique_languages_from_voices(response.voices)
print(f" Languages: {len(languages)} ".center(60, "-"))
for i, language in enumerate(sorted(languages)):
print(f"{language:>10}", end="n" if i % 5 == 4 else "")

def list_voices(language_code=None):
client = tts.TextToSpeechClient()
response = client.list_voices(language_code=language_code)
voices = sorted(response.voices, key=lambda voice: voice.name)
print(f" Voices: {len(voices)} ".center(60, "-"))
for voice in voices:
languages = ", ".join(voice.language_codes)
name = voice.name
gender = tts.SsmlVoiceGender(voice.ssml_gender).name
rate = voice.natural_sample_rate_hertz
print(f"{languages:<8} | {name:<24} | {gender:<8} | {rate:,} Hz")

def text_to_wav(voice_name: str, text: str):
language_code = "-".join(voice_name.split("-")[:2])
text_input = tts.SynthesisInput(text=text)
voice_params = tts.VoiceSelectionParams(
language_code=language_code, name=voice_name
)
audio_config = tts.AudioConfig(audio_encoding=tts.AudioEncoding.LINEAR16)
client = tts.TextToSpeechClient()
response = client.synthesize_speech(
input=text_input, voice=voice_params, audio_config=audio_config
)

filename = f"{voice_name}.wav"
with open(filename, "wb") as out:
out.write(response.audio_content)
print(f'Generated speech saved to "{filename}"')
return response.audio_content
list_languages()
list_voices("en")
generated_speech = text_to_wav('en-US-News-K', 'Make yourself comfortable, Hacker. Stay a while.')
pygame.mixer.init(frequency=24000, buffer = 2048)
speech_sound = pygame.mixer.Sound(generated_speech)
speech_sound.play()
time.sleep(5)
pygame.mixer.quit()

经过一番认真的挖掘,我找到了解决问题的方法。从2011年开始,感谢kasyc和IanHacker。

下面是一个深入的答案指南,以便其他有同样问题的人更容易。

首先,您需要在本地设备上安装libsndfile for samplerate。我有windows 64,所以链接可以在这里找到:win64

安装完成后,将samplerate安装到终端。

pip install samplerate

然后像这样导入样本:

from samplerate import resample

一旦一切都安装好了,我们可以将结束代码修改为:

generated_speech = text_to_wav('en-US-News-K', 'Make yourself comfortable, Hacker. Stay a while.')
pygame.mixer.init(frequency=24000, buffer = 512, allowedchanges=pygame.AUDIO_ALLOW_FREQUENCY_CHANGE, channels=1)
speech_sound = pygame.mixer.Sound(generated_speech)
snd_array = pygame.sndarray.array(speech_sound)
snd_resample = resample(snd_array, 1.8, "sinc_fastest").astype(snd_array.dtype)
snd_out = pygame.sndarray.make_sound(snd_resample)
snd_out.play()
time.sleep(5)
pygame.mixer.quit()

这将对generated_speech发出的声音进行采样,以返回一个'numpy.int16'数组。一旦样本被正确播放,你将不得不手动调整比例和参数在pygame.mixer中找到。Init '和'snd_resample = ressample '根据您的喜好。上面的代码是我用来使它听起来完美的,它可能对你不同。对比率和参数的每次调整可调整声音的速度和音高。

最后,一旦你调整了你喜欢的参数。声音输出应该固定!

最新更新