将 wav 音频字符串转换为 ogg 文件而不将其写入磁盘的最有效方法是什么?



我的最终目标是使用TTS将一些印度语文本转换为音频,并将该音频传递给接受mp3和ogg的消息传递系统。奥格是首选。

我在 Ubuntu 上,我获取音频字符串的流程是这样的。

  1. 印度语文本传递到 API
  2. API 返回一个 json,其中包含一个名为 audioContent 的键值。audioString = response.json()['audio'][0]['audioContent']
  3. 使用此decode_string = base64.b64decode(dat)到达解码的字符串

我目前正在将其转换为mp3,如您所见,我首先编写了wave文件,然后将其转换为mp3。

wav_file = open("output.wav", "wb")
decode_string = base64.b64decode(audioString)
wav_file.write(decode_string)
# Convert this to mp3 file
print('mp3file')
song = AudioSegment.from_wav("output.wav")
song.export("temp.mp3", format="mp3")

有没有办法在不执行io的情况下将audioString直接转换为ogg文件?

我已经尝试了 torchaudio 和 pyffmpeg 来加载audioString并进行转换,但它似乎不起作用。

我们可以将WAV数据写入FFmpegstdin管道,并从FFmpegstdout管道读取编码的OGG数据.
我的以下答案描述了如何使用视频,我们可以将相同的解决方案应用于音频。

<小时 />

管道结构:

--------------------  Encoded      ---------  Encoded      ------------
| Input WAV encoded  | WAV data    | FFmpeg  | OGG data    | Store to   |
| stream             | ----------> | process | ----------> | BytesIO    |
--------------------  stdin PIPE   ---------  stdout PIPE  -------------

该实现等效于以下 shell 命令:
cat input.wav | ffmpeg -y -f wav -i pipe: -acodec libopus -f ogg pipe: > test.ogg


根据维基百科,OGG格式的常见音频编解码器是Vorbis,Opus,FLAC和OggPCM(我选择了Opus音频编解码器)。

该示例使用 ffmpeg-python 模块,但它只是绑定到 FFmpeg 子进程(必须安装 FFmpeg CLI,并且必须在执行路径中)。


执行 FFmpeg 子流程,stdin管道作为输入,stdout管道作为输出:

ffmpeg_process = (
ffmpeg
.input('pipe:', format='wav')
.output('pipe:', format='ogg', acodec='libopus')
.run_async(pipe_stdin=True, pipe_stdout=True)
)

输入格式设置为wav,输出格式设置为ogg并且所选编码器libopus


假设音频文件相对较大,我们无法一次写入整个 WAV 数据,因为这样做(不"耗尽"stdout管道)会导致程序执行停止。

我们可能必须在单独的线程中写入 WAV 数据(以块为单位),并在主线程中读取编码数据。

下面是"编写器"线程的示例:

def writer(ffmpeg_proc, wav_bytes_arr):
chunk_size = 1024  # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(wav_bytes_arr) // chunk_size  # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(wav_bytes_arr) % chunk_size  # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(wav_bytes_arr[i*chunk_size:(i+1)*chunk_size])  # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(wav_bytes_arr[chunk_size*n_chunks:])  # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close()  # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.

"编写器线程"将 WAV 数据写入小卡盘.
最后一个块较小(假设长度不是卡盘大小的倍数)。

最后,stdin管道关闭.
关闭stdin完成数据编码,并关闭 FFmpeg 子进程。


在主线程中,我们正在启动线程,并从管道中读取编码stdout"OGG"数据(以块为单位):

thread = threading.Thread(target=writer, args=(ffmpeg_process, wav_bytes_array))
thread.start()
while thread.is_alive():
ogg_chunk = ffmpeg_process.stdout.read(1024)  # Read chunk with arbitrary size from stdout pipe
out_stream.write(ogg_chunk)  # Write the encoded chunk to the "in-memory file".

为了读取剩余的数据,我们可能会使用ffmpeg_process.communicate()

# Read the last encoded chunk.
ogg_chunk = ffmpeg_process.communicate()[0]
out_stream.write(ogg_chunk)  # Write the encoded chunk to the "in-memory file".

完整代码示例:

import ffmpeg
import base64
from io import BytesIO
import threading
# Equivalent shell command
# cat input.wav | ffmpeg -y -f wav -i pipe: -acodec libopus -f ogg pipe: > test.ogg
# Writer thread - write the wav data to FFmpeg stdin pipe in small chunks of 1KBytes.
def writer(ffmpeg_proc, wav_bytes_arr):
chunk_size = 1024  # Define chunk size to 1024 bytes (the exacts size is not important).
n_chunks = len(wav_bytes_arr) // chunk_size  # Number of chunks (without the remainder smaller chunk at the end).
remainder_size = len(wav_bytes_arr) % chunk_size  # Remainder bytes (assume total size is not a multiple of chunk_size).
for i in range(n_chunks):
ffmpeg_proc.stdin.write(wav_bytes_arr[i*chunk_size:(i+1)*chunk_size])  # Write chunk of data bytes to stdin pipe of FFmpeg sub-process.
if (remainder_size > 0):
ffmpeg_proc.stdin.write(wav_bytes_arr[chunk_size*n_chunks:])  # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process.
ffmpeg_proc.stdin.close()  # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.

# The example reads the decode_string from a file, assume: decoded_bytes_array = base64.b64decode(audioString)
with open('input.wav', 'rb') as f:
wav_bytes_array = f.read()
# Encode as base64 and decode the base64 - assume the encoded and decoded data are bytes arrays (not UTF-8 strings).
dat = base64.b64encode(wav_bytes_array)  # Encode as Base64 (used for testing - not part of the solution).
wav_bytes_array = base64.b64decode(dat)  # wav_bytes_array applies "decode_string" (from the question).

# Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output.
ffmpeg_process = (
ffmpeg
.input('pipe:', format='wav')
.output('pipe:', format='ogg', acodec='libopus')
.run_async(pipe_stdin=True, pipe_stdout=True)
)
# Open in-memory file for storing the encoded OGG file
out_stream = BytesIO()
# Starting a thread that writes the WAV data in small chunks.
# We need the thread because writing too much data to stdin pipe at once, causes a deadlock.
thread = threading.Thread(target=writer, args=(ffmpeg_process, wav_bytes_array))
thread.start()
# Read encoded OGG data from stdout pipe of FFmpeg, and write it to out_stream
while thread.is_alive():
ogg_chunk = ffmpeg_process.stdout.read(1024)  # Read chunk with arbitrary size from stdout pipe
out_stream.write(ogg_chunk)  # Write the encoded chunk to the "in-memory file".
# Read the last encoded chunk.
ogg_chunk = ffmpeg_process.communicate()[0]
out_stream.write(ogg_chunk)  # Write the encoded chunk to the "in-memory file".
out_stream.seek(0)  # Seek to the beginning of out_stream
ffmpeg_process.wait() # Wait for FFmpeg sub-process to end
# Write out_stream to file - just for testing:
with open('test.ogg', "wb") as f:
f.write(out_stream.getbuffer())

您可以通过以下方式使用TorchAudio执行此操作。

几个注意事项

  1. OPUS支持可通过libsox(在Windows上不可用)或ffmpeg(在Linux/macOS/Windows上可用)获得。
  2. 在最新的稳定版本 (v0.13) 上,torchaudio.save可以使用libsox对 OPUS 格式进行编码。但是,libsox上的底层实现存在错误,因此不建议将torchaudio.save用于 OPUS。
  3. 相反,建议使用torchaudio.io中的StreamWriter,从 v0.13 开始可用。(您需要安装ffmpeg>=4.1,<5)
  4. OPUS 仅支持 48kHz。
  5. OPUS 仅支持单声道。指定 1 以外的num_channels不会引发错误,但会产生错误的音频数据。
import io
import base64
from torchaudio.io import StreamReader, StreamWriter

# 0. Generate test data
with open("foo.wav", "rb") as file:
data = file.read()
data = base64.b64encode(data)
# 1. Decode base64
data = base64.b64decode(data)
# 2. Load with torchaudio
reader = StreamReader(io.BytesIO(data))
reader.add_basic_audio_stream(
frames_per_chunk=-1,  # Decode all the data at once
format="s16p",  # Use signed 16-bit integer
)
reader.process_all_packets()  # Decode all the data
waveform, = reader.pop_chunks()  # Get the waveform
# 3. Save to OPUS.
writer = StreamWriter("output.opus")
writer.add_audio_stream(
sample_rate=48000,  # OPUS only supports 48000 Hz
num_channels=1,  # OPUS only supports monaural
format="s16",
encoder_option={"strict": "experimental"},
)
with writer.open():
writer.write_audio_chunk(0, waveform)

最新更新