我正在努力将30秒的音乐样本分类为四种类型之一:["电子","嘻哈","爵士乐","摇滚"]帮助。
我已经从mp3文件中生成了自己的数据集。在我的"数据集"目录中,我通过流派安排了100首歌曲,将25首用于每种类型的子目录(即我都有"电子","嘻哈",等级级别)。
。到目前为止,我已经提取了这些MP3文件中每个文件的30秒样本,使信号归一化,因此没有样品超过-32或-18 dB,将它们混合到单声道,然后将它们转换为WAV(使用Pydub)。
接下来,我使用libreosa为每首歌曲中的1292帧提取MFCC(MEL频率cepstral系数)。然后我将数据缩放,因此使用Sklearn的预处理模块具有零均值和单位方差。
我取了这些值,并将它们保存到MFFC的CSV文件中,其中每一行都是帧,每列都是12个系数之一。
因此,您可以得到一个想法,歌曲随后是前20帧:Julio Bashmore -au seve.mp3
1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
-1.6792870551627723,-0.3842983399875271,-0.4027844785642914,0.7165034707424635,-0.6823681099880697,0.8160136728323858,-1.5263184054951733,1.1145290823984928,-0.21784328023531235,0.047527570975473235,0.32866412875434237,1.869661743729989
2.4841536110972022,-0.5831476248573247,0.37058328670683277,-1.4599220579565508,-0.35449671920732007,-1.1326787825224918,0.5880762356956317,-0.8108172607843107,0.010134004741811507,-0.14931018884055094,0.8707111843072819,0.1667143116197902
2.0939135826765907,-0.4778720879089441,0.26530387765936375,-1.7076132053582773,0.11305806361775678,-1.310823349961563,1.1669240812438573,-0.8627333493359391,-0.19252158214293175,-0.039523355794829566,0.6658161856594883,0.2860711396454278
1.0127547898943148,-0.6547501371081066,-0.202002065081406,-1.7889468252345162,1.1632837017143651,-0.9288351063974712,2.070078331574107,-0.7601750354687623,-0.27909671541985936,-0.13713166210030908,0.2267359005199065,0.27808482310773774
0.06572087004052392,-1.8740496505118946,-0.9604185325425617,-1.0163364869696865,1.5840872642483552,-0.16659361108422382,1.7806813371087853,0.055159751832777354,0.6842054675590546,0.42350598071605017,-0.3324771084186967,-0.24348528197848257
0.7159690101152768,-2.6235135217332606,-0.9099658866643047,-0.19653348650619468,1.0348534863167884,-0.6771927176675163,1.0703663687805878,0.3981886714210787,0.8503521825769755,0.4055860454830591,-0.11841556456925736,0.05030541244676532
0.8810398765824345,-2.7727001749452045,-0.8484274387283207,0.14839104995756489,0.9124992899968386,-0.5987705973726993,0.6471053665081234,0.43190059553550836,0.9028748015921237,0.3425604687141461,-0.20209176692016032,0.15561852907964296
1.5217565976091,-2.5946551685896044,-0.3924558895014341,0.36743931340001096,0.9126773048246598,-0.7581004315396501,0.4463892360730688,0.42969123923287,0.7276796949470707,-0.0079165602005986,-0.580154306587985,-0.07235102966750707
2.08861621898524,-1.2804976691396324,0.46640912894919145,0.14007051920782673,0.9100754665932002,-1.51168507329552,0.7161640071116147,0.34780954351977644,0.30123647629161765,-1.103443008391695,-0.7900432022174468,-0.2847124076141728
1.300078728794466,-0.6136862665584394,0.5321920666343034,-0.25881789165042973,1.2648582642185016,-1.7504670292559645,1.4050993480861744,0.354988549965
我尝试简单地将每首歌曲的1292帧向量汇编成每种类型的一个大量向量,然后将其用作Scikitlearn的KNN算法的输入。我的结果非常不幸,我只是得到一个充满"摇滚"的矢量
我很确定我根本没有正确地处理它,但是我有以下两个功能。第一个为每种类型创建此特征向量。第二个只是使用每个流派矢量和载有类型标签的矢量来训练它。
def create_np_vector(db_read, start_row):
num_frames = 1292
num_songs = 25
num_coefs = 12
#vector for all features of every song/sample of that genre
genre_vec = np.empty([(num_frames * num_songs), num_coefs])
db_reader = csv.reader(db_read)
for row in itertools.islice(db_reader, start_row, start_row + 25):
id = row[0]
path = row[2]
mfcc_file = path + "/csv/" + id + ".csv"
mfcc_reader = csv.reader(open(mfcc_file, 'r'))
frame_num = 0
for mfcc_row in mfcc_reader:
frame_vec = np.array(mfcc_row)
genre_vec[frame_num] = frame_vec
frame_num = frame_num + 1
return genre_vec
def train_knn(db_read, knn):
genres = ["electronic", "hip hop", "jazz", "rock"]
line_num = [0, 26, 51, 76]
x = 0
for genre in genres:
vec = create_np_vector(db_read, line_num[x])
x = x + 1
print(genre)
knn.fit(vec, [genre for x in range(25*1292)])
我到底应该在这里做什么?我一直在尝试将其用作资源:http://modelai.gettysburg.edu/2012/music/,但我仍然迷路了。我应该计算每个音频文件的平均向量和协方差矩阵吗?
即使我这样做了,每个文件的两个向量会做什么?
MFCC是起动器的好方法。但是,有12个系数很可能会使您的分类器不知所措。卑鄙,您的表现肯定会有所改善。这是Luis Pedro Coelho和Willi Richert的Python建筑机器学习系统中所述的完整解决方案。
您可以从网站获得免费试用。足以阅读有关以下章节的章节:音乐流派分类。
,每个类别仅25个样本,每个样本1292个功能,很难发现它的任何模式,数据太嘈杂了。也可以降低复杂性,您可以尝试计算MFCC数据的摘要统计数据,例如每个频段的平均/STD/MIN/MAX/MAX/KERTOSOS。要捕获一些时间变化,您可能还需要包括MFCC Frame Deltas的摘要。
MFCC功能最初是为语音设计的。您可能想尝试使用一些更量身定制音乐的功能。libessentia和librosa都具有诸如节奏,音乐钥匙等事物的功能提取器。