如何使用Python Azure文本到语音api生成mp3文件



我能够生成一个";玛丽有一只小羊羔;使用下面的代码。但当我尝试生成mp3 时,它失败了

#https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python
import azure.cognitiveservices.speech as speechsdk
languageCode = 'en-US'
ssmlGender = 'MALE'
voicName = 'en-US-JennyNeural'
speakingRate = '-5%'
pitch = '-10%'
voiceStyle = 'newscast'
azureKey = 'FAKE KEY'
azureRegion = 'FAKE REGION'
#############################################################
#audioOuputFile = './audioFiles/test.wav'
audioOuputFile = './audioFiles/test.mp3'
#############################################################
txt = 'Mary had a little lamb it's fleece was white as snow.'
txt+= 'And everywhere that Mary went, the lamb was sure to go,'
txt+= 'It followed her to school one day,'
txt+= 'That was against the rule,'
txt+= 'It made the children laugh and play,'
txt+= 'To see a lamb at school.'
head1 = f'<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="{languageCode}">'
head2 = f'<voice name="{voicName}">'
head3 =f'<mstts:express-as style="{voiceStyle}">'
head4 = f'<prosody rate="{speakingRate}" pitch="{pitch}">'
tail= '</prosody></mstts:express-as></voice></speak>'
ssml = head1 + head2 + head3 + head4 + txt + tail
print('this is the ssml======================================')
print(ssml)
print('end ssml======================================')
print()
speech_config = speechsdk.SpeechConfig(subscription=azureKey, region=azureRegion)
audio_config = speechsdk.AudioConfig(filename=audioOuputFile)
#HERE IS THE PROBLEM
#Without this statement everything works fine
#Can produce a wav file 
speech_config.set_speech_synthesis_output_format(SpeechSynthesisOutputFormat["Audio16Khz128KBitRateMonoMp3"])
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_ssml_async(ssml)

这是控制台输出:

(envo(D:\py_new\tts>python ttsTest3.py这是ssml======================================<mstts:表示为style="新闻广播">玛丽有一只小羊羔,它的羊毛像雪一样白。玛丽走到哪里,小羊都会去,有一天它跟着她去上学,这是违反规定的,它让孩子们笑着玩,在学校看到小羊<mstts:表示为>end ssml====================================

追踪(最近一次通话(:文件";D: \py_new\tts\test3.py";,第45行,inspeech_config.set_speech_ssynthesis_output_format(SpeechSynthesisOutputFormat["Audio16Khz128KBitRateMonoMp3"](NameError:名称"SpeechSynthesisOutputFormat"未定义

(envo(D:\py_new\tts>

注意错误:NameError:名称"SpeechSynthesisOutputFormat"未定义

与比较:自定义音频格式

在:

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=编程语言python

它在Nodejs中运行良好。但我也需要能够在Python中做到这一点。

试试这个

speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)

您需要配置类似的音频

def hindi_text_to_speech_azure(hindi_text):
speech_config = SpeechConfig(subscription=SPEECH_KEY, region=LOCATION_AREA)
# Note: if only language is set, the default voice of that language is chosen.
speech_config.speech_synthesis_language = LANGUAGE_LOCATION_HINDI  # e.g. "de-DE"
# The voice setting will overwrite language setting.
# The voice setting will not overwrite the voice element in input SSML.
speech_config.speech_synthesis_voice_name = MALE_VOICE_NAME_HINDI
audio_config = AudioOutputConfig(
filename="{name}.mp3".format(name=hindi_text[:30]))
synthesizer = SpeechSynthesizer(
speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async(hindi_text)

试试这个。

但问题是,实际上这不是问题,但我坚持在本地保存文件,但我想在本地存储的服务器(默认存储(上立即上传。你知道吗?

最新更新