Offline Handwritten Text Recognition of Historical Ge’ez  Manuscripts Using Deep Learning Techniques

Gurmu, Mesfin Geresu

dc.contributor.author	Gurmu, Mesfin Geresu
dc.date.accessioned	2022-02-03T08:16:26Z
dc.date.available	2022-02-03T08:16:26Z
dc.date.issued	2021-01-26
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/6208
dc.description.abstract	Handwriting recognition of historical documents is still largely unsolved problem in the field of pattern recognition. This thesis investigates how the-state-of-the-art deep learning techniques perform handwriting recognition in the context of historical Ge’ez manuscripts. Though Ge’ez was the language of literature in Ethiopia until the middle of the 19th century, it is underrepresented in the research areas of document image analysis and recognition. Thus handwriting recognition system is proposed based on real-world large scale digitization scenarios. Its architecture is comprised of tasks, namely: pre-processing (binarization and skew estimation), page layout analysis, recognition model, and post processing. For each task, experimental setup is designed. In the task of binarization, four binarization methods (Otsu’s global method, Otsu’s local method, Sauvola’s method and Gato’s adaptive method) were investigated using FM, p-FM, PSNR and DRD evaluation metrics. Sauvola’s method outperforms all other methods on all the metrics. In the document image skew estimation task, Hough transform based method was investigated by experimenting and examining the results over a dataset. Evaluation criterion AED, TOP80, and CE were used and obtained values equal to 0.3115, 0.058, and 76.00 respectively. In the page layout analysis task, the performance of Leptonica which is open source C library was investigated and achieved results with high success rate on region and text line level over a wide variety of page layouts of actual historical Ge’ez manuscripts. The final experimental setup was designed for building a recognition model using Tesseract OCR engine. Due to a difficulty to prepare large training data with ground truth from actual historical documents, fine tuning approach was proposed and applied in the context of historical Ge’ez manuscripts. A total of 257 text line images collected from 15 different pages were prepared and able to build a recognition model with character error rate of 2.632%. Overall, the performed experiments with the prototyping approach have produced encouraging results so that a complete OCR system development for historical Ge’ez manuscripts is applicable. The major weakness of the study is optimization. Therefore, further optimization technique with large training sample is required. Furthermore, as a future work, investigation needs to consider incorporating post-processing into the recognition process	en_US
dc.language.iso	en_US	en_US
dc.title	Offline Handwritten Text Recognition of Historical Ge’ez Manuscripts Using Deep Learning Techniques	en_US
dc.type	Thesis	en_US