Abstract:
This thesis work presents a method for characterization of breast cancer using Fourier Transform
Infrared (FTIR) spectroscopy. FTIR spectroscopy is a tool used to analyse the structure and
chemical composition of both organic and inorganic materials. Recently, it has become a key
technique for biomedical applications and achieved considerable advancements in the field of
cancer diagnosis. Analysing typical group frequencies in FTIR spectrum enables qualitative
chemical composition estimations for the materials detected. In the current thesis work, FTIR
spectroscopy was applied with multivariate explorative tools and machine learning classification
models to characterize breast cancer tissues, extract important chemical components from the
tissues, and automatically subtype and grade the breast cancer spectral data. A total of 462 and
126 FTIR spectra were used for breast cancer subtype and grade analysis, respectively. The
obtained results showed that as breast cancer progresses, changes were visually differentiated on
the spectra by analysing the peak in the lipid, nucleic acid, protein and carbohydrate regions of the
normal, malignant, and benign samples. Then using principal component analysis (PCA) score
plots and loading plots, and based on the respective sample variance, wavenumbers holding
important biochemical components were extracted for each breast cancer subtype and each grade.
Finally, Radial basis function (RBF) Kernel support vector machine (SVM) was used to classify
breast cancer tissue spectra subtypes into adenosis, fibroadenoma, hyperplasia, fibrocystic change,
normal breast, lobular carcinoma and ductal carcinoma and that resulted in a training accuracy of
91.0% and testing accuracy of 90.7%. In terms of grading, the method was also able to classify
the dataset into Grade I, Grade II and Grade III with a training accuracy is 84.0% and testing
accuracy of 83.7%. This work is significant as it makes the work of pathologists easier to find
different breast cancer biomarkers and easily classify the breast types, subtypes and grades
automatically. It comes with great promises for use in early detection of breast cancer providing
accurate diagnoses and cut down on the time-consuming labour effort.