喜欢转录几个长(荷兰语)音频文件。每个文件的采访时长约为60-120分钟。只有8个文件,我需要手动做,所以不一定是一些自动化软件的一部分。获得了一些Azure学分,所以想到了Azure认知服务语音到文本。有这样的样品吗?
试过这个例子:https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample。工作很好。但在音频中短暂停顿后立即停止。
在这里看到了一个类似的问题:语音到文本的大型音频文件[Microsoft Speech API]。但是海报没有分享他是如何解决的。
有人能帮忙吗?
对于较长的音频文件,我们建议使用批处理转录api。这里有一个很好的解释:https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription,这里有c#和Python的示例:https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch.
下面是一个简单的python示例,用于将大型音频文件转录为文本文件。(它没有使用批处理,所以它需要一点。希望这对你有帮助。
import time
import os
import azure.cognitiveservices.speech as speechsdk
def transcribe(key,region,lang,path_in,path_out="out.txt",newLine=False):
speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
speech_config.speech_recognition_language=lang
audio_config = speechsdk.audio.AudioConfig(filename=path_in)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
textOut = ""
def stop_cb(evt):
print(evt)
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
str_newLine = ""
if newLine:
str_newLine = " n"
def outPrint(evt):
nonlocal textOut
tmp_text = evt.result.text
textOut += tmp_text + str_newLine
print(tmp_text)
speech_recognizer.recognized.connect(outPrint)
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
with open(path_out, 'w') as f:
f.write(textOut)
if __name__ == "__main__":
key = "YOUR_KEY"
region = "REGION_eg_westus"
lang = "INPUT_LANGUAGE" # See e.g. https://learn.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/help/language-locale
path_in = ""
path_out = ""
transcribe(key,region,lang,path_in,path_out)