我是Google Cloud文本转语音服务的新手。文档中显示了具有rate
和pitch
属性的<prosody>
标签。但这些对我的要求没有影响。例如,如果我使用rate="slow"
或rate="fast"
,或pitch="+2st"
或pitch="-2st"
,结果与文档上的示例相同且不同,其具有较慢的速率和较低的音调。
我确保了最新的版本:
python3 -m pip install --upgrade google-cloud-texttospeech
最小可复制示例:
import os
from google.cloud import texttospeech
AUDIO_CONFIG = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"
tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name= "en-US-Wavenet-A"
)
ssml_input = texttospeech.SynthesisInput(
ssml='<prosody rate="fast" pitch="+2st">Can you hear me now?</prosody>'
# or this one:
#ssml='<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>'
)
response = tts_client.synthesize_speech(
input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)
with open("/tmp/cloud.wav", 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
我如何使用谷歌云的速度和音高韵律属性?
根据本文档,当您编写SSML脚本在文本到语音代码中,SSML脚本的格式应该类似于:
<speak>
<prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody>
</speak>
你可以参考下面提到的一段代码,我在我的结束,它为我工作。
代码1:我使用音调低和速率慢.
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak><prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody></speak>'
)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
音频输出输出音频
代码2:我使用的速率为fast和节距+5st.
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak><prosody rate="fast" pitch="+5st">Hi good morning have a nice day</prosody></speak>'
)
# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)
# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
音频输出输出音频