每秒获取音频文件的最大振幅



我知道这里有一些类似的问题,但大多数都是关于生成波形图像,这不是我想要的。

我的目标是为音频文件生成波形可视化,类似于SoundCloud,但不是图像。我想要一个数组中音频剪辑的每一秒(或半秒)的最大振幅数据。然后我可以使用这些数据创建一个基于CSS的可视化。

理想情况下,我想得到一个数组,它具有每秒的所有振幅值,占整个音频文件最大振幅的百分比。这里有一个例子:

[
    0.0,  # Relative max amplitude of first second of audio clip (0%)
    0.04,  # Relative max amplitude of second second of audio clip (4%)
    0.15,  # Relative max amplitude of third second of audio clip (15%)
    # Some more
    1.0,  # The highest amplitude of the whole audio clip will be 1.0 (100%)
]

我想我至少需要使用numpy和Python的wave模块,但我不确定如何获得我想要的数据。我想使用Python,但我并不完全反对使用某种命令行工具。

如果允许gstreamer,这里有一个小脚本可以实现这一点。它接受gstreamer可以处理的任何音频文件。

  • 构建gstreamer管道,使用audiocovert将通道减少到1,并使用level模块获得峰值
  • 运行管道,直到EOS被命中
  • 从发现的最小值/最大值对峰值进行归一化

代码段:

import os, sys, pygst
pygst.require('0.10')
import gst, gobject
gobject.threads_init()
def get_peaks(filename):
    global do_run
    pipeline_txt = (
        'filesrc location="%s" ! decodebin ! audioconvert ! '
        'audio/x-raw-int,channels=1,rate=44100,endianness=1234,'
        'width=32,depth=32,signed=(bool)True !'
        'level name=level interval=1000000000 !'
        'fakesink' % filename)
    pipeline = gst.parse_launch(pipeline_txt)
    level = pipeline.get_by_name('level')
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    peaks = []
    do_run = True
    def show_peak(bus, message):
        global do_run
        if message.type == gst.MESSAGE_EOS:
            pipeline.set_state(gst.STATE_NULL)
            do_run = False
            return
        # filter only on level messages
        if message.src is not level or 
           not message.structure.has_key('peak'):
            return
        peaks.append(message.structure['peak'][0])
    # connect the callback
    bus.connect('message', show_peak)
    # run the pipeline until we got eos
    pipeline.set_state(gst.STATE_PLAYING)
    ctx = gobject.gobject.main_context_default()
    while ctx and do_run:
        ctx.iteration()
    return peaks
def normalize(peaks):
    _min = min(peaks)
    _max = max(peaks)
    d = _max - _min
    return [(x - _min) / d for x in peaks]
if __name__ == '__main__':
    filename = os.path.realpath(sys.argv[1])
    peaks = get_peaks(filename)
    print 'Sample is %d seconds' % len(peaks)
    print 'Minimum is', min(peaks)
    print 'Maximum is', max(peaks)
    peaks = normalize(peaks)
    print peaks

还有一个输出示例:

$ python gstreamerpeak.py 01 Tron Legacy Track 1.mp3 
Sample is 182 seconds
Minimum is -349.999999922
Maximum is -2.10678956719
[0.0, 0.0, 0.9274581631597019, 0.9528318436488018, 0.9492396611762614,
0.9523404330322813, 0.9471685835966183, 0.9537281219301242, 0.9473486577135167,
0.9479292126411365, 0.9538221105563514, 0.9483845795252251, 0.9536790832823281,
0.9477264933378022, 0.9480077366961968, ...

最新更新