如何减少信号重叠中的噪声加入Java



我一直在使用Java程序(由其他人开发)进行文本到语音综合。综合是通过"二号手机"的串联来完成的。在Oroginal版本中,没有信号处理。刚收集并将辅助装置加在一起以产生输出。为了提高输出,我尝试执行连接语音信号的"相位匹配"。我所做的修改在这里总结了:

  • 音频数据是从AudioInputStream收集到字节数组的。由于音频数据为16位,我将字节数组转换为一个短阵列。
  • "信号处理"是在短阵列上完成的。
  • 要输出音频数据,短数组再次转换为字节数组。

这是我在现有程序中更改的代码的一部分:

音频输入
每个双酮都会调用此段。

原始版本

audioInputStream = AudioSystem.getAudioInputStream(sound);
while ((cnt = audioInputStream.read(byteBuffer, 0, byteBuffer.length)) != -1) {
    if (cnt > 0) {
        byteArrayPlayStream.write(byteBuffer, 0, cnt);
    }
}

我的版本

// public varialbe declarations 
byte    byteSoundFile[];                             // byteSoundFile will contain a whole word or the diphones of a whole word
short   shortSoundFile[]    = new short[5000000];    // sound contents are taken in a short[] array for signal processing
short   shortBuffer[];
int     pos                 = 0;
int     previousPM          = 0;
boolean isWord              = false;
public static HashMap<String, Integer> peakMap1 = new HashMap<String, Integer>(); 
public static HashMap<String, Integer> peakMap2 = new HashMap<String, Integer>();
// code for receiving and processing audio data
if(pos == 0) {
    // a new word is going to be processed.
    // so reset the shortSoundFile array
    Arrays.fill(shortSoundFile, (short)0);
}
audioInputStream = AudioSystem.getAudioInputStream(sound);
while ((cnt = audioInputStream.read(byteBuffer, 0, byteBuffer.length)) != -1) {
    if (cnt > 0) {
        byteArrayPlayStream.write(byteBuffer, 0, cnt);
    }
}
byteSoundFile = byteArrayPlayStream.toByteArray();
int nSamples = byteSoundFile.length;
byteArrayPlayStream.reset();
if(nSamples > 80000) {   // it is a word
    pos     = nSamples;
    isWord  = true;
}
else {              // it is a diphone
    // audio data is converted from byte to short, so nSamples is halved
    nSamples /= 2;
    // transfer byteSoundFile contents to shortBuffer using byte-to-short conversion
    shortBuffer = new short[nSamples];
    for(int i=0; i<nSamples; i++) {
        shortBuffer[i] = (short)((short)(byteSoundFile[i<<1]) << 8 | (short)byteSoundFile[(i<<1)+1]);
    }
    /************************************/
    /**** phase-matching starts here ****/
    /************************************/
    int pm1 = 0;
    int pm2 = 0;
    String soundStr = sound.toString();
    if(soundStr.contains("\") && soundStr.contains(".")) {
        soundStr = soundStr.substring(soundStr.indexOf("\")+1, soundStr.indexOf("."));
    }                    
    if(peakMap1.containsKey(soundStr)) {
        // perform overlap and add
        System.out.println("we are here");
        pm1 = peakMap1.get(soundStr);
        pm2 = peakMap2.get(soundStr);
        /*
        Idea:
        If pm1 is located after more than one third of the samples,
        then threre will be too much overlapping.
        If pm2 is located before the two third of the samples, 
        then where will also be extra overlapping for the next diphone.
        In both of the cases, we will not perform the peak-matching operation.
        */
        int idx1 = (previousPM == 0) ? pos : previousPM - pm1;
        if((idx1 < 0) || (pm1 > (nSamples/3))) {
            idx1 = pos;
        }
        int idx2 = idx1 + nSamples - 1;
        for(int i=idx1, j=0; i<=idx2; i++, j++) {
            if(i < pos) {
                shortSoundFile[i] = (short) ((shortSoundFile[i] >> 1) + (shortBuffer[j] >> 1));
            }
            else {
                shortSoundFile[i] = shortBuffer[j];
            }
        }
        previousPM = (pm2 < (nSamples/3)*2) ? 0 : idx1 + pm2;
        pos = idx2 + 1;
    }
    else {
        // no peak found. simply concatenate the audio data
        for(int i=0; i<nSamples; i++) {
            shortSoundFile[pos++] = shortBuffer[i];
    }
    previousPM = 0;
}

音频输出
收集了单词的所有副指挥后,该片段被称为播放音频输出。
原始版本

byte audioData[] = byteArrayPlayStream.toByteArray();
... code for writing audioData to output steam

我的版本

byte audioData[];
if(isWord) {
    audioData = Arrays.copyOf(byteSoundFile, pos);
    isWord = false;
}
else {
    audioData = new byte[pos*2];
    for(int i=0; i<pos; i++) {
        audioData[(i<<1)]   = (byte) (shortSoundFile[i] >>> 8);
        audioData[(i<<1)+1] = (byte) (shortSoundFile[i]);
    }
}
pos = 0;
... code for writing audioData to output steam

但是在修改完成后,输出变得更糟。输出中有很多噪音。

这是一个带有修改的示例音频:修改后的输出

这是原始版本的示例音频:原始输出

现在,如果有人可以指出产生噪音以及如何删除噪声的原因,我将不胜感激。我在代码中做错了什么?我已经在 mablab 中测试了我的算法。

问题已暂时解决。事实证明,byte数组与short数组之间的转换是不需要的。所需的信号处理操作可以直接在byte阵列上执行。
我想保持这个问题打开,以防有人在给定代码中找到错误。

最新更新