和弦字典(Python)中具有特殊字符(例如#、/)的Regex问题



我正在编写和弦词典,为此我需要将不同类型的和弦分组为更小的组。

然而,我在处理一些包含#(例如C#、C#m(等字符的变体以及D7/F#和A/B等变体时遇到了问题,我想将它们插入其他变体中。

我相信这是一些正则表达式参数,我承认我不太熟悉。

这是开发的代码:

triadeMaior = pd.DataFrame({'triadeMaior': ['C','C#','Db','D','D#','Eb','E','F','F#','Gb','G','G#','Ab','A','A#','Bb','B']
})
triadeMenor = pd.DataFrame({'triadeMenor': ['Cm','C#m','Dbm','Dm','D#m','Ebm','Em','Fm','F#m','Gbm','Gm','G#m','Abm','Am','A#m','Bbm','Bm']
})
triadeDiminuta = pd.DataFrame({'triadeDiminuta':['Cdim','C#dim','Dbdim', 'Ddim', 'D#dim', 'Ebdim', 'Edim', 'Fdim', 'F#dim', 'Gbdim','Gdim', 
'G#dim', 'Abdim', 'Adim', 'A#dim', 'Bbdim', 'Bdim']
})
triadeAumentada = pd.DataFrame({'triadeAumentada':['Caug','C#aug','Dbaug','Daug','D#aug','Ebaug','Eaug','Faug','F#aug','Gbaug','Gaug','G#aug','Abaug','Aaug','A#aug','Bbaug','Baug' ]
})
setima = pd.DataFrame({'setima':['C7','C#7','Db7','D7','D#7','Eb7','E7','F7','F#7','Gb7','G7','G#7','Ab7','A7','A#7','Bb7','B7']
})
setimaMenor = pd.DataFrame({'setimaMenor':['Cm7','C#m7','Dbm7','Dm7','D#m7','Ebm7','Em7','Fm7','F#m7','Gbm7','Gm7','G#m7','Abm7','Am7','A#m7','Bbm7','Bm7']
})
setimaMaior = pd.DataFrame({'setimaMaior':['Cmaj7', 'C#maj7', 'Dbmaj7', 'Dmaj7', 'D#maj7', 'Ebmaj7', 'Emaj7', 'Fmaj7', 'F#maj7','Gbmaj7','Gmaj7', 'G#maj7','Abmaj7','Amaj7','A#maj7','Bbmaj7','Bmaj7']
})
setimaMenorQuinta = ({'setimaMenorQuinta':['Cm7b5','C#m7b5', 'Dbm7b5', 'Dm7b5', 'D#m7b5', 'Ebm7b5','Em7b5', 'Fm7b5', 'F#m7b5', 'Gbm7b5', 'Gm7b5', 'G#m7b5', 'Abm7b5', 'Am7b5', 'A#m7b5', 'Bbm7b5', 'Bm7b5']
})
sexta= pd.DataFrame({'sexta':['C6','C#6','Db6','D6','D#6','Eb6','E6','F6','F#6','Gb6','G6','G#6','Ab6','A6','A#6','Bb6','B6']
})
sextaMenor = pd.DataFrame({'sextaMenor': ['Cm6','C#m6','Dbm6','Dm6','D#m6','Ebm6','Em6','Fm6','F#m6','Gbm6','Gm6','G#m6','Abm6','Am6','A#m6' 
'Bbm6','Bm6']
})
triadeMaior_pat = fr"b({'|'.join(triadeMaior['triadeMaior'])})b"
triadeMenor_pat = fr"b({'|'.join(triadeMenor['triadeMenor'])})b"
triadeDiminuta_pat = fr"b({'|'.join(triadeDiminuta['triadeDiminuta'])})b"
triadeAumentada_pat = fr"b({'|'.join(triadeAumentada['triadeAumentada'])})b"
setima_pat = fr"b({'|'.join(setima['setima'])})b"
setimaMenor_pat = fr"b({'|'.join(setimaMenor['setimaMenor'])})b"
setimaMaior_pat = fr"b({'|'.join(setimaMaior['setimaMaior'])})b"
setimaMenorQuinta_pat = fr"b({'|'.join(setimaMenorQuinta['setimaMenorQuinta'])})b"
sexta_pat = fr"b({'|'.join(sexta['sexta'])})b"
sextaMenor_pat = fr"b({'|'.join(sextaMenor['sextaMenor'])})b"
df['chordType'] = df['chords'].replace({triadeMaior_pat: 'triadeMaj',
triadeMenor_pat: 'triadeMen',
triadeDiminuta_pat: 'triadeDim',
triadeAumentada_pat: 'triadeAug',
setima_pat: 'setima',
setimaMenor_pat: 'setimaMen', 
setimaMaior_pat: 'setimaMaj',
setimaMenorQuinta_pat : 'setimaMenQui',
sexta_pat:'sexta',
sextaMenor_pat: 'sextaMen',                               
r'b(?!triadeMaj|triadeMen|triadeDim|triadeAug|setima|setimaMen|setimaMen|setimaMaj|sexta|sextaMenb)w+': 'outros'}, 
regex=True)

以下是一些结果的示例:

和弦chordType
C#、E7、Abm、Amaj7、E、Abm,C#m,EtriadeMaj#
E,A7,G6,D/F#,F6,E,Em,D7/F#Fmaj7,E,A7,G6,D7/F#,triadeMaj,setima,sexta,setima/triadeMaj#,sexti,triadeMen,triadeMaj,setimaMen,triadeMaj

实际上,如果没有regex,这可能会更干净。

这个例子只使用了数据的一小部分,但您可以用所有映射来填充chord_typesdict。

import pandas as pd
chord_types = {'C': 'triadeMaj', 'C#': 'triadeMaj', 'C7': 'setima'} # Add as required
df = pd.DataFrame(['C, C7', 'C, C#'], columns=('chords',)) # Toy example
map_fn = lambda cs: ', '.join((chord_types.get(c, 'outros') for c in cs))
df['chordType'] = df['chords'].str.replace(' ', '').str.split(',').apply(map_fn)
print(df)

给予:

chords             chordType
0  C, C7     triadeMaj, setima
1  C, C#  triadeMaj, triadeMaj

最新更新