如何在python中使用谷歌语音识别api

我有一个mp3文件，我想使用谷歌的语音识别来从该文件中提取文本。任何我能找到文档或示例的想法都将不胜感激。

看看Google Cloud Speech API，它使开发人员能够将音频转换为文本[…]API可以识别80多种语言和变体[…]您可以创建一个免费帐户以获得有限数量的API请求。

如何：

您需要首先安装gcloudpython模块&谷歌api python客户端模块与：

pip install --upgrade gcloud
pip install --upgrade google-api-python-client

然后在云平台控制台中，转到项目页面并选择或创建一个新项目。在您需要为项目启用计费后，请启用云语音API。

启用Google Cloud Speech API后，单击"转到凭据"按钮设置您的Cloud Speech API凭据

有关如何从代码授权到云语音API服务的信息，请参阅设置服务帐户

您应该同时获得一个服务帐户密钥文件（JSON）和一个GOOGLE_APPLICATION_CREDENTIALS环境变量，该环境变量将允许您对Speech API 进行身份验证

完成后，从谷歌下载音频原始文件，并从谷歌下载speech-discovery_google_rest_v1.json

修改以前下载的JSON文件以设置凭据密钥然后确保您已经将GOOGLE_APPLICATION_CREDENTIALS环境变量设置为.json文件的完整路径，其中包含：

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_file.json

还有

确保您已将GCLOUD_PROJECT环境变量设置为Google Cloud项目的ID，并使用：

export GCLOUD_PROJECT=your-project-id

假设全部完成，您可以创建一个教程.py文件，其中包含：

import argparse
import base64
import json
from googleapiclient import discovery
import httplib2
from oauth2client.client import GoogleCredentials

DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?'
                 'version={apiVersion}')

def get_speech_service():
    credentials = GoogleCredentials.get_application_default().create_scoped(
        ['https://www.googleapis.com/auth/cloud-platform'])
    http = httplib2.Http()
    credentials.authorize(http)
    return discovery.build(
        'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL)

def main(speech_file):
    """Transcribe the given audio file.
    Args:
        speech_file: the name of the audio file.
    """
    with open(speech_file, 'rb') as speech:
        speech_content = base64.b64encode(speech.read())
    service = get_speech_service()
    service_request = service.speech().syncrecognize(
        body={
            'config': {
                'encoding': 'LINEAR16',  # raw 16-bit signed LE samples
                'sampleRate': 16000,  # 16 khz
                'languageCode': 'en-US',  # a BCP-47 language tag
            },
            'audio': {
                'content': speech_content.decode('UTF-8')
                }
            })
    response = service_request.execute()
    print(json.dumps(response))
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'speech_file', help='Full path of audio file to be recognized')
    args = parser.parse_args()
    main(args.speech_file)

然后运行：

python tutorial.py audio.raw

相关内容

最新更新

热门标签：