Python如何将pyaudio字节转换为虚拟文件



简而言之

有没有一种方法可以将原始音频数据(由PyAudio模块获得(转换为虚拟文件的形式(可以使用pythonopen()功能获得(,而无需将其保存到磁盘并从磁盘中读取?详情如下。

我在做什么

我使用PyAudio录制音频,然后将其输入到tensorflow模型中以获得预测。目前,当我首先将录制的声音作为.wav文件保存在磁盘上,然后再次读取以将其输入模型时,它就可以工作了。这是记录和保存的代码:

import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)]  # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()

在我获得原始音频数据(变量frames(后,可以使用pythonwave模块保存它,如下所示。我们可以看到,在保存时,必须通过调用wf.setxxx等函数来保存一些元消息。

import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

下面是使用保存的文件对tensorflow模型进行推理的代码。它只是简单地将其读取为二进制,然后模型将处理其余部分

import classifier  # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)

问题

为了满足实时需求,我需要保持音频流,并每隔一段时间将其输入模型。但是将文件保存在磁盘上,然后一次又一次地读取似乎是不合理的,这将在I/O上花费大量时间。

我想将数据保存在memory中并直接使用,而不是重复保存和读取。但是,pythonwave模块不支持同时读写(参考此处(。

如果我直接馈送数据而没有一些元数据(例如信道、帧速率((wave模块可以在保存过程中添加(,如下所示:

w = b''.join(frames)
classifier.run_graph(w, labels, 5)

我将得到如下错误:

2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found 
Traceback (most recent call last):
File "C:Usersanaconda3envstensorflowlibsite-packagestensorflow_corepythonclientsession.py", line 1365, in _do_call
return fn(*args)
File "C:Usersanaconda3envstensorflowlibsite-packagestensorflow_corepythonclientsession.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:Usersanaconda3envstensorflowlibsite-packagestensorflow_corepythonclientsession.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found

这里提供了我使用的tensorflow模型:用于MCU的ML KWS,希望这能有所帮助。以下是产生错误的代码:(classifier.run_graph()(

def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
#   Feed the audio data as input to the graph.
#   predictions  will contain a two-dimensional array, where one
#   dimension represents the input image count, and the other has
#   predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0

您应该能够使用io.BytesIO而不是物理文件,它们共享相同的接口,但BytesIO只保存在内存中:

import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()

这应该允许您在使用TensorFlow代码读取多余数据的同时,将数据连续流式传输到文件中。

最新更新