我正在使用SSML,所以我的应用程序可以说话。该应用程序本身在我的手机上运行得很好,但当我通过蓝牙将手机与设备连接时,大多会出现间隙或延迟。在演讲的开头或中间。
例如,当音频是Hello John, I am your assistant. How can I help you?
时,输出可以是sistant. How can I help you?
。有时句子很流利,但有时也会有这样的差距。
这就是我播放音频文件的方式:
String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.prepare();
mMediaPlayer.start();
这就是它的全部类别:
public class Tts {
public Context context;
private final MediaPlayer mMediaPlayer;
public Tts(Context context, MediaPlayer mMediaPlayer) {
this.context = context;
this.mMediaPlayer = mMediaPlayer;
}
@SuppressLint({"NewApi", "ResourceType", "UseCompatLoadingForColorStateLists"})
public void say(String text) throws Exception {
InputStream stream = context.getResources().openRawResource(R.raw.credential); // R.raw.credential is credential.json
GoogleCredentials credentials = GoogleCredentials.fromStream(stream);
TextToSpeechSettings textToSpeechSettings =
TextToSpeechSettings.newBuilder()
.setCredentialsProvider(
FixedCredentialsProvider.create(credentials)
).build();
// Instantiates a client
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(textToSpeechSettings)) {
// Replace {name} with target
SharedPreferences sharedPreferences = context.getSharedPreferences("target", Context.MODE_PRIVATE);
String target = sharedPreferences.getString("target", null);
text = (target != null) ? text.replace("{name}", target) : text.replace("null", "");
// Set the text input to be synthesized
String myString = "<speak><prosody pitch="low">" + text + "</prosody></speak>";
SynthesisInput input = SynthesisInput.newBuilder().setSsml(myString).build();
// Build the voice request, select the language code ("en-US") and the ssml voice gender
// ("neutral")
VoiceSelectionParams voice =
VoiceSelectionParams.newBuilder()
.setName("de-DE-Wavenet-E")
.setLanguageCode("de-DE")
.setSsmlGender(SsmlVoiceGender.MALE)
.build();
// Select the type of audio file you want returned
AudioConfig audioConfig =
AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();
// Perform the text-to-speech request on the text input with the selected voice parameters and
// audio file type
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
// Get the audio contents from the response
ByteString audioContents = response.getAudioContent();
// Write the response to the output file.
try (FileOutputStream out = new FileOutputStream(context.getFilesDir() + "/output.mp3")) {
out.write(audioContents.toByteArray());
}
String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_MUSIC).build());
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.prepare();
mMediaPlayer.setOnPreparedListener(mediaPlayer -> mMediaPlayer.start());
}
}
}
距离不可能是原因,因为我的手机就在设备旁边。
谷歌的SSML需要一个互联网连接。所以我不太确定这个差距是因为蓝牙还是互联网连接。
因此,无论是什么原因,我都在努力缩小差距。音频应该在准备好并准备好播放时播放。
我尝试了什么
这是我尝试过的,但我没有听到任何区别:
mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_SPEECH).build());
我也用mMediaPlayer.prepareAsync()
尝试了mMediaPlayer.prepare()
,但音频不会播放(或者至少我听不见(。
在侦听器中调用start()
:
mMediaPlayer.setOnPreparedListener(mediaPlayer -> {
mMediaPlayer.start();
});
不幸的是,这种差距有时仍然存在。
这是我提出的解决方案。查看代码中的// ***
注释,看看我在问题中对您的代码做了哪些更改。
也要谨慎对待,因为我现在没有办法测试。
尽管如此,据我所知,这就是使用MediaPlayer API所能做的一切。如果这对你的蓝牙设备仍然不起作用,你应该尝试不同的蓝牙设备,如果这也没有帮助,也许你可以将整个切换为使用AudioTrack API而不是MediaPlayer,这为你提供了低延迟设置,你可以直接从响应中使用音频数据,而不是将其写入文件并再次从中读取。
public class Tts {
public Context context;
private final MediaPlayer mMediaPlayer;
public Tts(Context context, MediaPlayer mMediaPlayer) {
this.context = context;
this.mMediaPlayer = mMediaPlayer;
}
@SuppressLint({"NewApi", "ResourceType", "UseCompatLoadingForColorStateLists"})
public void say(String text) throws Exception {
InputStream stream = context.getResources().openRawResource(R.raw.credential); // R.raw.credential is credential.json
GoogleCredentials credentials = GoogleCredentials.fromStream(stream);
TextToSpeechSettings textToSpeechSettings =
TextToSpeechSettings.newBuilder()
.setCredentialsProvider(
FixedCredentialsProvider.create(credentials)
).build();
// Instantiates a client
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(textToSpeechSettings)) {
// Replace {name} with target
SharedPreferences sharedPreferences = context.getSharedPreferences("target", Context.MODE_PRIVATE);
String target = sharedPreferences.getString("target", null);
text = text.replace("{name}", (target != null) ? target : ""); // *** bug fixed
// Set the text input to be synthesized
String myString = "<speak><prosody pitch="low">" + text + "</prosody></speak>";
SynthesisInput input = SynthesisInput.newBuilder().setSsml(myString).build();
// Build the voice request, select the language code ("en-US") and the ssml voice gender
// ("neutral")
VoiceSelectionParams voice =
VoiceSelectionParams.newBuilder()
.setName("de-DE-Wavenet-E")
.setLanguageCode("de-DE")
.setSsmlGender(SsmlVoiceGender.MALE)
.build();
// Select the type of audio file you want returned
AudioConfig audioConfig =
AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();
// Perform the text-to-speech request on the text input with the selected voice parameters and
// audio file type
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
// Get the audio contents from the response
ByteString audioContents = response.getAudioContent();
// Write the response to the output file.
try (FileOutputStream out = new FileOutputStream(context.getFilesDir() + "/output.mp3")) {
out.write(audioContents.toByteArray());
}
String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder() // *** moved here (should be done before prepare and very likely AFTER reset)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) // *** changed to speech
.setUsage(AudioAttributes.USAGE_ASSISTANT) // *** added
.setFlags(AudioAttributes.FLAG_AUDIBILITY_ENFORCED) // *** added
.build());
mMediaPlayer.prepare();
// *** following line changed since handler was defined AFTER prepare and
// *** the prepare call isn't asynchronous, thus the handler would never be called.
mMediaPlayer.start();
}
}
}
希望你继续前进!