Microsoft Cognitive SpeechRecognizer Stuck



我正在尝试使用MS认知语音SDK对一些波文件进行语音到文本的转换。它对某些文件工作得很好,但对其他文件却卡住了。卡住了,我的意思是它不会停止,直到手动取消。

我首先尝试了RecognizeOnceAsync方法:

private static void processRecording()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var audioStream = new PushAudioInputStream())
{
audioStream.Write(File.ReadAllBytes("myfilepath"));
using (var audioConfig = AudioConfig.FromStreamInput(audioStream))
{
using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
{
var result = speechRecognizer.RecognizeOnceAsync().Result;
switch (result.Reason)
{
case ResultReason.RecognizedSpeech:
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
break;
case ResultReason.NoMatch:
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
break;
case ResultReason.Canceled:
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}, ErrorCode={cancellation.ErrorCode}, ErrorDetails={cancellation.ErrorDetails}");
break;
}
}
}
}
}

我得到(一分钟后):

CANCELED: Reason=Error, ErrorCode=ServiceTimeout, ErrorDetails=Timeout: no recognition result received SessionId: 322853a3085d41ec9b60ee940531038c

然后我尝试使用StartContinuousRecognitionAsync:

private async static Task processRecordingsAsync()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
var waiter = new System.Threading.ManualResetEvent(false);
var audioStream = new PushAudioInputStream();
audioStream.Write(File.ReadAllBytes("myfilepath"));
var audioConfig = AudioConfig.FromStreamInput(audioStream);
var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Action cleanup = () =>
{
waiter.Set();
try { speechRecognizer.Dispose(); } catch { }
try { audioConfig.Dispose(); } catch { }
try { audioStream.Dispose(); } catch { }
};
speechRecognizer.Recognizing += (sender, e) => Console.WriteLine($"Recognizing: {e.Result.Text}");
speechRecognizer.SessionStarted += (sender, e) => Console.WriteLine($"Recognize session started");
speechRecognizer.SessionStopped += (sender, e) => Console.WriteLine($"Recognize session stopped");
speechRecognizer.SpeechEndDetected += (sender, e) => Console.WriteLine($"Speech end detected");
speechRecognizer.SpeechStartDetected += (sender, e) => Console.WriteLine($"Speech start detected");
speechRecognizer.Recognized += (sender, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"Recognized text: {e.Result.Text}");
}
else
{
Console.WriteLine($"Could not recognize text: {e.Result.Reason}");
}
cleanup();
};
speechRecognizer.Canceled += (sender, e) =>
{
Console.WriteLine($"Error trying to recognize text: Reason = {e.Reason}, ErrorCode = {e.ErrorCode}, ErrorDetails = {e.ErrorDetails}");
cleanup();
};
await speechRecognizer.StartContinuousRecognitionAsync();
if (!waiter.WaitOne(60000))
{
await speechRecognizer.StopContinuousRecognitionAsync();
}
}

我得到:

Recognize session started
Speech start detected
Recognizing: con el
Recognizing: con el servicio de tele
Recognizing: con el servicio de tele terapia
Recognizing: con el servicio de tele terapia de
Recognizing: con el servicio de tele terapia de tercer
Recognize session stopped
Error trying to recognize text: Reason = Error, ErrorCode = ServiceTimeout, ErrorDetails = Timeout while waiting for service to stop SessionId: e289298cf97447b89bd088a665e6c095

所以它做了大约90%的文件(大约4秒长),但它卡住了,直到我用StopContinuousRecognitionAsync强制它才结束。

当我在演讲工作室尝试这个文件时,它几乎可以识别出相同的东西,但它不会卡住。

请注意,我使用的是免费订阅。会是因为这个吗?还有什么我可以试试的吗?

出现这种情况的原因是正在使用的音频输入流仍在耐心地"等待"。因为有可能有更多的数据被推送到它。流无法知道这是文件的完整内容,而不是正在进行的实时输入流的转发,只是被阻塞了几秒钟。如果流的结尾没有添加足够的尾随沉默,那么假设的未来数据甚至可能影响您接收到的最终识别结果——这就是为什么您会看到文件的结尾尚未被识别(它尚未最终确定)。

两个可能的修复:

  1. PushAudioInputStream上调用.Close()或写一个空缓冲区(.Write(new byte[0]))来显式标记流的结束,并允许SDK在不等待更多数据的情况下包装东西
  2. 如果只是文件输入,考虑使用AudioConfig.FromWavFileInput来避免自己需要任何这些步骤。

作为一个额外的注意事项:我不建议在来自相同对象的回调(事件)中调用这些SDK对象上的.Dispose。如果在调用Dispose的回调完成后仍然有挂起的回调等待分派,这可能会导致一些有趣的事情。如果需要比IDisposable通过using语句提供的更及时的处理,则在主线程(例如,通过等待TaskCompletionSource在完成时发出信号)或新的任务线程(Thread.Run(() => cleanup()))上进行处理,将避免任何潜在的拆除和事件的并发问题。

最新更新