torchaudio load for PCM file - EfficientConformer - torchaudio load for PCM file

我很难解析PCM文件中的音频长度。

EfficientConformer使用LibriSpeechDataset，音频文件格式为flac，但在我的情况下，我使用的是pcm文件。EfficientConformer通过Torcohavelike this 提取音频长度

audio_length = torchaudio.load(DATASET_PATH)[0].size(1)

但就我的情况而言，它不适用于PCM文件，所以我尝试了不同的方式。

我做了什么

通过以下代码首先获得信号

signal = np.memmap(audio_path, dtype='h', mode='r').astype('float32')
if sum(abs(signal)) <= 80:
raise ValueError('[WARN] Silence file in {0}'.format(audio_path))
return signal / 32767  # normalize audio

然后得到波形

waveform = Tensor(signal).unsqueeze(0).t()

然后最终在昏暗的(1(中获得尺寸

audio_length = waveform.size(1)

但它在终端中保持打印1

这是我的PCM数据集信息

没有头pcm文件
采样频率：16000Hz
单声道

如何在pcm文件中获取音频长度？

如果使用TorchAudio v0.12或更高版本，则使用torchaudio.io.StreamReader可以直接加载PCM。

参考编号：https://pytorch.org/audio/main/tutorials/streamreader_basic_tutorial.html#headerless-介质

s = StreamReader(src=PATH, format="s16le", option={"sample_rate": "16000"})
s.add_basic_audio_stream(frames_per_chunk=-1)
s.process_all_packets()
waveform, = s.pop_chunks

torchaudio load for PCM file - EfficientConformer

我做了什么

相关内容

最新更新

热门标签：