Voice Biometric Based Forensic Speaker Identification Using Machine Learning

GEBEYEHU, SILESHI AWEKE

dc.contributor.author	GEBEYEHU, SILESHI AWEKE
dc.date.accessioned	2022-02-16T11:46:07Z
dc.date.available	2022-02-16T11:46:07Z
dc.date.issued	2021-11-13
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/6313
dc.description.abstract	Following the progress of general-purpose speaker recognition technology, specific appli cation oriented systems are emerging based on voice bio-metric. Forensic speaker recogni tion is one core application area of speaker recognition. The chief application of forensic speaker recognition is identifying the actual criminal among handed suspects relying upon traced voice evidence. This thesis work aims to adopt a text-independent speaker identi fication for forensic speaker recognition, and examine the impact of training and testing speech corpora levels of utterance on proofing of identity of the actual criminal among handed suspects. The proposed system is designed relying upon two indispensable and consecutive ap proaches, so-called front-end feature extraction and back-end feature classification. The front-end approach was employed for speaker-specific feature extraction purposes, and it had been done using a digital signal processing background, specifically using a Mel frequency Cepstral Coefficient (MFCC). The back-end approach is employed for feature classification (suspected criminal speaker modeling and actual criminal identification) tasks. A Machine Learning (ML) based Gaussian Mixture Model (GMM) state- of-the art with Expectation-Maximization (EM) algorithm used to build a reference model for each suspected criminal speaker and the Maximum Log-Likelihood (MLL) score tech nique was employed for actual criminal identification. Also, to enhance the quality of the speech corpora, and minimize computational complexity from the feature extraction and feature classification stages, a preprocessing techniques (spectral noise gate based background noise removal and short-time energy based VAD silence truncation) has been used before the feature extraction stage. To evaluate the performance of the proposed system, we have carried out a simulation based implementation using Python programming on the PyCharm environment. A self collected and prepared Amharic language speech corpora used for implementation. The experimental evaluation of the proposed system is conducted on 20 speakers (who per formed on the behave of suspects) recorded from ongoing mobile phone conversation at the callee side using a smartphone, and an interview room using a recorder microphone in the form of a rehearsal reading speech. The system trained and tested using the speech corpora at three levels of utterance (word, sentence and paragraph). The system achieved 84.29%, 95.00% and 97.50% respective IDRs for WLU, SLU, and PLU of mobile phone recorded speech corpora and also 85.00%, 96.25%, and 97.50% for microphone recorded speech corpora. From this study observation, apart from selecting fitting feature extrac tion and modeling approaches, the level of utterance of a corpora also a significant role in determining the recognition performance, and a corpora with longer level of utterances is more convenient to attain a better performance. However,the proposed system poorly performed for crossed levels of utterances and multi-modal recording training-testing sce narios yet. Hence, improving these poor performances can be the next research direction of this study	en_US
dc.language.iso	en_US	en_US
dc.subject	EM	en_US
dc.subject	Feature Extraction	en_US
dc.subject	FSR	en_US
dc.subject	GMM	en_US
dc.subject	Level of Utterance	en_US
dc.subject	MFCC	en_US
dc.subject	MLL score	en_US
dc.subject	Speech Corpora	en_US
dc.subject	Suspects	en_US
dc.title	Voice Biometric Based Forensic Speaker Identification Using Machine Learning	en_US
dc.type	Thesis	en_US