未将完整音频转换为文本



我试过下面的代码:

import speech_recognition as sr
r = sr.Recognizer()
filename = "demo.wav"
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data)
print(text)

从这里

输出如下

result2:
{   'alternative': [   {   'confidence': 0.92995489,
'transcript': 'talking nonsense'},
{'transcript': 'you talking nonsense'},
{'transcript': 'are you talking nonsense'},
{'transcript': 'Divya talking nonsense'},
{'transcript': 'are talking nonsense'}],
'final': True}
talking nonsense

但是音频文件包含:"我相信你只是在胡说八道">

为什么不给出完整的音频??请帮我想一下……

Thankuu

函数recognize_google"使用Google语音识别API执行语音识别">

语音识别显然不可能与输入100%准确。

如函数recognize_google的文档中所述:

返回show_all为false(默认值)时最可能的转录。否则,将原始API响应作为JSON字典返回。

引发speech_recognition.UnknownValueError异常,如果语音是不可理解的。如果语音识别操作失败,如果密钥无效,或者没有互联网连接,则引发speech_recognition.RequestError异常。

第一行("result2:"您在代码中看到的模型是函数recognize_gogle的输出(而不是结果)(参见源代码行#918)。

最后一行("胡说八道")是函数recognize_gogle的实际结果,它基于不同假设的置信度值(参见源代码行#921ff)

如果您想获得完整的结果,将参数show_all=True添加到recognize_gogle

下面的例子展示了如何在不记录波文件的情况下测试它。这个wavefile是由espeak生成的(存在于大多数linux发行版中)。

import speech_recognition as sr
import subprocess
import pprint
wave_file = '/path/to/your/wavefile.wav'
text = "I believe you are just talking nonsense"
proc = subprocess.Popen(['espeak', '-a', '200', '-s', '130', '-w', wave_file, text])
proc.communicate()
recognizer = sr.Recognizer()
with sr.AudioFile(wave_file) as source:
audio_data = recognizer.record(source)
if audio_data is not None:
recognized_text = recognizer.recognize_google(audio_data, show_all=True)
pprint.pprint(recognized_text)
{'alternative': [{'confidence': 0.88625956,
'transcript': "I'm talking nonsense"},
{'transcript': 'talking nonsense'},
{'transcript': "I'm talking London"},
{'transcript': 'talking London'},
{'transcript': "I'm talking now"}],
'final': True}

最新更新