Jimma University Open access Institutional Repository

Offline Handwritten Text Recognition of Historical Ge’ez Manuscripts Using Deep Learning Techniques

Show simple item record

dc.contributor.author Gurmu, Mesfin Geresu
dc.date.accessioned 2022-02-03T08:16:26Z
dc.date.available 2022-02-03T08:16:26Z
dc.date.issued 2021-01-26
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/6208
dc.description.abstract Handwriting recognition of historical documents is still largely unsolved problem in the field of pattern recognition. This thesis investigates how the-state-of-the-art deep learning techniques perform handwriting recognition in the context of historical Ge’ez manuscripts. Though Ge’ez was the language of literature in Ethiopia until the middle of the 19th century, it is underrepresented in the research areas of document image analysis and recognition. Thus handwriting recognition system is proposed based on real-world large scale digitization scenarios. Its architecture is comprised of tasks, namely: pre-processing (binarization and skew estimation), page layout analysis, recognition model, and post processing. For each task, experimental setup is designed. In the task of binarization, four binarization methods (Otsu’s global method, Otsu’s local method, Sauvola’s method and Gato’s adaptive method) were investigated using FM, p-FM, PSNR and DRD evaluation metrics. Sauvola’s method outperforms all other methods on all the metrics. In the document image skew estimation task, Hough transform based method was investigated by experimenting and examining the results over a dataset. Evaluation criterion AED, TOP80, and CE were used and obtained values equal to 0.3115, 0.058, and 76.00 respectively. In the page layout analysis task, the performance of Leptonica which is open source C library was investigated and achieved results with high success rate on region and text line level over a wide variety of page layouts of actual historical Ge’ez manuscripts. The final experimental setup was designed for building a recognition model using Tesseract OCR engine. Due to a difficulty to prepare large training data with ground truth from actual historical documents, fine tuning approach was proposed and applied in the context of historical Ge’ez manuscripts. A total of 257 text line images collected from 15 different pages were prepared and able to build a recognition model with character error rate of 2.632%. Overall, the performed experiments with the prototyping approach have produced encouraging results so that a complete OCR system development for historical Ge’ez manuscripts is applicable. The major weakness of the study is optimization. Therefore, further optimization technique with large training sample is required. Furthermore, as a future work, investigation needs to consider incorporating post-processing into the recognition process en_US
dc.language.iso en_US en_US
dc.title Offline Handwritten Text Recognition of Historical Ge’ez Manuscripts Using Deep Learning Techniques en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account