如何提取和存储自动语音识别深度学习应用程序生成的文本



该应用程序可以在huggingface中查看https://huggingface.co/spaces/rowel/asr

import gradio as gr
from transformers import pipeline

model = pipeline(task="automatic-speech-recognition",
model="facebook/s2t-medium-librispeech-asr")
gr.Interface.from_pipeline(model,
title="Automatic Speech Recognition (ASR)",
description="Using pipeline with Facebook S2T for ASR.",
examples=['data/ljspeech.wav',]
).launch()

我不知道用那几行代码将文本文件存储在哪里。我想把句子文本存储在一个字符串中。

老实说,我只知道基本的python编程。我只想把它们存储到字符串变量中,并用它们做一些事情。

您可以打开Interface.from_pipeline抽象,并定义自己的Gradio接口。您需要定义自己的输入、输出和预测函数,从而从模型中访问文本预测。下面是一个例子。

你可以在这里测试https://huggingface.co/spaces/radames/Speech-Recognition-Example


import gradio as gr
from transformers import pipeline

model = pipeline(task="automatic-speech-recognition",
model="facebook/s2t-medium-librispeech-asr")

def predict_speech_to_text(audio):
prediction = model(audio)
# text variable contains your voice-to-text string
text = prediction['text']
return text

gr.Interface(fn=predict_speech_to_text,
title="Automatic Speech Recognition (ASR)",
inputs=gr.inputs.Audio(
source="microphone", type="filepath", label="Input"),
outputs=gr.outputs.Textbox(label="Output"),
description="Using pipeline with F acebook S2T for ASR.",
examples=['ljspeech.wav'],
allow_flagging='never'
).launch()

最新更新