我们如何在 Python 中使用 API 提高语音到文本转换的准确性recognize_sphinx


import speech_recognition as sr
# Obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "english.wav")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "chinese.flac")
# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # Read the entire audio file
# Recognize speech using Sphinx
print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))

因此,如果我理解正确,那么您很难根据用户(或者在您的情况下是音频文件)所说的内容获得正确的输出。 例如,音频/用户会说"你好!"输出可能是"完全不同的东西"。

查看您的代码时,我注意到您正在使用三种不同的音频文件。每个文件都使用不同的语言。当您打开语音识别的文档时,您将看到有一个库引用。在此库参考中,将有关于使用 PocketSphinx 的说明。首先要突出的是:



安装后,您只需使用 recognizer_instance.recognize_sphinx 的语言参数指定语言即可。例如,法语将指定为"fr-FR",普通话将指定为"zh-CN"。



要解决此问题,只需添加一个语言参数并将其设置为您希望指定它的语言。 例如,

import speech_recognition as sr
# Obtain path to "chinese.flac" in the same folder as this script
from os import path
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")
# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source)  # Read the entire audio file
# Recognize speech using Sphinx
# Just pass a language parameter
print("Sphinx thinks you said " + r.recognize_sphinx(audio, language="zh-CN"))
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
