Swift 4:检测音频流中最强的频率或频率的存在



我正在编写一个应用程序,需要检测音频流中的频率。我已经读了大约一百万篇文章,在过终点线时遇到了问题。我通过苹果公司的AVFoundation框架在这个功能中获得了我的音频数据。

我正在使用Swift 4.2,并尝试过使用FFT函数,但目前它们有点让我不知所措。

有什么想法吗?

// get's the data as a call back for the AVFoundation framework.
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// prints the whole sample buffer and tells us alot of information about what's inside
print(sampleBuffer);
// create a buffer, ready out the data, and use the CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer method to put
// it into a buffer
var buffer: CMBlockBuffer? = nil
var audioBufferList = AudioBufferList(mNumberBuffers: 1,
mBuffers: AudioBuffer(mNumberChannels: 1, mDataByteSize: 0, mData: nil))
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, bufferListSizeNeededOut: nil, bufferListOut: &audioBufferList, bufferListSize: MemoryLayout<AudioBufferList>.size, blockBufferAllocator: nil, blockBufferMemoryAllocator: nil, flags: UInt32(kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment), blockBufferOut: &buffer);
let abl = UnsafeMutableAudioBufferListPointer(&audioBufferList)
var sum:Int64 = 0
var count:Int = 0
var bufs:Int = 0
var max:Int64 = 0;
var min:Int64 = 0
// loop through the samples and check for min's and maxes.
for buff in abl {
let samples = UnsafeMutableBufferPointer<Int16>(start: UnsafeMutablePointer(OpaquePointer(buff.mData)),
count: Int(buff.mDataByteSize)/MemoryLayout<Int16>.size)
for sample in samples {
let s = Int64(sample)
sum = (sum + s*s)
count += 1
if(s > max) {
max = s;
}
if(s < min) {
min = s;
}
print(sample)
}
bufs += 1
}
// debug
print("min - (min), max = (max)");
// update the interface
DispatchQueue.main.async {
self.frequencyDataOutLabel.text = "min - (min), max = (max)";
}
// stop the capture session
self.captureSession.stopRunning();
}

经过大量研究,我发现答案是使用FFT方法(快速傅立叶变换(。它从上面的iPhone代码中获取原始输入,并将其转换为一组值,这些值表示频带中每个频率的幅度。

这里的开放代码有很多支撑https://github.com/jscalo/tempi-fft创建了一个可视化工具来捕获数据并显示它。从那时起,这就是一个操纵数据以满足需求的问题。在我的案例中,我一直在寻找高于人类听力的频率(20kHz范围(。通过扫描tempi-fft码中阵列的后半部分,我能够确定我所寻找的频率是否存在并且足够大。

最新更新