确定 wav 文件的位深度

我正在寻找一种快速的，最好是标准的库机制来确定wav文件的位深度，例如"16位"或"24位"。

我正在使用对 Sox 的子进程调用来获取大量音频元数据，但子进程调用非常慢，我目前只能从 Sox 可靠获得的唯一信息是位深度。

内置的 wave 模块没有像"getbitdepth()"这样的函数，也与 24 位 wav 文件不兼容 - 我可以使用"try except"使用 wave 模块访问文件元数据(如果它有效，请手动记录它是 16 位)然后打开，除了调用 sox 代替(sox 将执行分析以准确记录其位深度)。我担心的是，这种方法感觉像是猜测工作。如果读取 8 位文件怎么办？当它不是时，我会手动分配 16 位。

SciPy.io.wavefile也与24位音频不兼容，因此会产生类似的问题。

本教程非常有趣，甚至包括一些非常低级(至少对于 Python 来说是低级)的脚本示例，以从 wav 文件头中提取信息 - 不幸的是，这些脚本不适用于 16 位音频。

有没有办法简单地(并且不调用 sox)确定我正在检查的 wav 文件的位深度？

我正在使用的波形标头解析器脚本如下：

import struct
import os
def print_wave_header(f):
'''
Function takes an audio file path as a parameter and 
returns a dictionary of metadata parsed from the header
'''
r = {} #the results of the header parse
r['path'] = f
fin = open(f,"rb") # Read wav file, "r flag" - read, "b flag" - binary 
ChunkID=fin.read(4) # First four bytes are ChunkID which must be "RIFF" in ASCII
r["ChunkID"]=ChunkID
ChunkSizeString=fin.read(4) # Total Size of File in Bytes - 8 Bytes
ChunkSize=struct.unpack('I',ChunkSizeString) # 'I' Format is to to treat the 4 bytes as unsigned 32-bit inter
TotalSize=ChunkSize[0]+8 # The subscript is used because struct unpack returns everything as tuple
r["TotalSize"]=TotalSize
DataSize=TotalSize-44 # This is the number of bytes of data
r["DataSize"]=DataSize
Format=fin.read(4) # "WAVE" in ASCII
r["Format"]=Format
SubChunk1ID=fin.read(4) # "fmt " in ASCII
r["SubChunk1ID"]=SubChunk1ID
SubChunk1SizeString=fin.read(4) # Should be 16 (PCM, Pulse Code Modulation)
SubChunk1Size=struct.unpack("I",SubChunk1SizeString) # 'I' format to treat as unsigned 32-bit integer
r["SubChunk1Size"]=SubChunk1Size
AudioFormatString=fin.read(2) # Should be 1 (PCM)
AudioFormat=struct.unpack("H",AudioFormatString) ## 'H' format to treat as unsigned 16-bit integer
r["AudioFormat"]=AudioFormat[0]
NumChannelsString=fin.read(2) # Should be 1 for mono, 2 for stereo
NumChannels=struct.unpack("H",NumChannelsString) # 'H' unsigned 16-bit integer
r["NumChannels"]=NumChannels[0]
SampleRateString=fin.read(4) # Should be 44100 (CD sampling rate)
SampleRate=struct.unpack("I",SampleRateString)
r["SampleRate"]=SampleRate[0]
ByteRateString=fin.read(4) # 44100*NumChan*2 (88200 - Mono, 176400 - Stereo)
ByteRate=struct.unpack("I",ByteRateString) # 'I' unsigned 32 bit integer
r["ByteRate"]=ByteRate[0]
BlockAlignString=fin.read(2) # NumChan*2 (2 - Mono, 4 - Stereo)
BlockAlign=struct.unpack("H",BlockAlignString) # 'H' unsigned 16-bit integer
r["BlockAlign"]=BlockAlign[0]
BitsPerSampleString=fin.read(2) # 16 (CD has 16-bits per sample for each channel)
BitsPerSample=struct.unpack("H",BitsPerSampleString) # 'H' unsigned 16-bit integer
r["BitsPerSample"]=BitsPerSample[0]
SubChunk2ID=fin.read(4) # "data" in ASCII
r["SubChunk2ID"]=SubChunk2ID
SubChunk2SizeString=fin.read(4) # Number of Data Bytes, Same as DataSize
SubChunk2Size=struct.unpack("I",SubChunk2SizeString)
r["SubChunk2Size"]=SubChunk2Size[0]
S1String=fin.read(2) # Read first data, number between -32768 and 32767
S1=struct.unpack("h",S1String)
r["S1"]=S1[0]
S2String=fin.read(2) # Read second data, number between -32768 and 32767
S2=struct.unpack("h",S2String)
r["S2"]=S2[0]
S3String=fin.read(2) # Read second data, number between -32768 and 32767
S3=struct.unpack("h",S3String)
r["S3"]=S3[0]
S4String=fin.read(2) # Read second data, number between -32768 and 32767
S4=struct.unpack("h",S4String)
r["S4"]=S4[0]
S5String=fin.read(2) # Read second data, number between -32768 and 32767
S5=struct.unpack("h",S5String)
r["S5"]=S5[0]
fin.close()
return r

与马蒂亚斯的答案相同，但使用可复制粘贴的代码。

要求

pip install soundfile

法典

import soundfile as sf
ob = sf.SoundFile('example.wav')
print('Sample rate: {}'.format(ob.samplerate))
print('Channels: {}'.format(ob.channels))
print('Subtype: {}'.format(ob.subtype))

解释

通道：通常为 2 个，这意味着您有一个左扬声器和一个右扬声器。
采样率：音频信号是模拟的，但我们希望以数字方式表示它们。这意味着我们希望在价值和时间上离散它们。采样率给出了我们每秒得到一个值的次数。单位为赫兹。采样率至少需要原始声音中最高频率的两倍，否则会出现混叠。人类的听力范围从~20Hz到~20kHz，因此您可以切断20kHZ以上的任何声音。这意味着超过40kHz的采样率没有多大意义。
：位深度越高，可以捕获的动态范围就越大。动态范围是乐器、部分或音乐片段最安静和最响亮音量之间的差异。典型值似乎是 16 位或 24 位。16 位的位深度的理论动态范围为 96 dB，而 24 位的动态范围为 144 dB(源)。
子类型：PCM_16表示 16 位深度，其中 PCM 代表脉冲编码调制。

另类

如果您只寻找命令行工具，那么我可以推荐MediaInfo：

$ mediainfo example.wav
General
Complete name                            : example.wav
Format                                   : Wave
File size                                : 83.2 MiB
Duration                                 : 8 min 14 s
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 411 kb/s
Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 8 min 14 s
Bit rate mode                            : Constant
Bit rate                                 : 1 411.2 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 kHz
Bit depth                                : 16 bits
Stream size                              : 83.2 MiB (100%)

我强烈推荐声音文件模块(但请注意，我非常有偏见，因为我写了其中的很大一部分)。

在那里，您可以将文件作为声音文件打开。SoundFile 对象，该对象具有保存您要查找的信息的子类型属性。

在您的情况下，这可能是'PCM_16'或'PCM_24'.

不清楚此更新何时发布，但内置的 wave 模块似乎与 24 位 wav 文件兼容。我正在使用python 3.10.5

wave_read sampwidth() 方法声明它返回字节。我相当确定只是取这个值并乘以 8 会给我们位深度。例如：

with wave.open(path, 'rb') as wav:
bit_depth = wav.getsampwidth() * 8

对于 16 位文件，getsampwidth()返回2，为 24 位文件返回3。无需额外的模块或子流程！

要求

法典

解释

另类

相关内容

最新更新

热门标签：