Event And Temporal Information Extraction From Amharic Text

Ephrem Tadesse

Event And Temporal Information Extraction From Amharic Text

Ephrem Tadesse

URI: https://repository.ju.edu.et//handle/123456789/4662

Date: 2017-09

Abstract:

The drastic increase of large volume of data on the web becomes cumbersome to get relevant information. To tackle this problem a lot of information extraction tasks have been done from the literature background. Event and Temporal information extraction is one of information extraction tasks, which helps to get important events from large set of texts with their chronological order and answers the question of what happened on a certain situation as well as when does it happen. Unlike other information tasks like entity extraction research needs felt for event and temporal information extraction especially for Amharic still there is no work on this particular IE task. As the first comprehensive work we designed a model on event and temporal information extraction from Amharic text. The model is comprised of different components including common preprocessing, learning and classification, event extraction, temporal information extraction. To develop the proposed model we used different approaches for each tasks. For event extraction component we used a machine learning classifier but the classifier fails to detect deverbal events. To resolve the machine learning classifier limitation of missing deverbal entities due to their ambiguities we used rule based approach using syntactic features such as POS, morphological analyzer, and list of gazetteers. In practice it‟s difficult to stay within the boundary of single event extraction method. So as both approaches have advantages and disadvantages combining those results to get the advantage of the machine learning classifier and the rule based approach advantage we developed hybrid approach for event extraction. For temporal information extraction component regular expression, list of temporal gazetteers in combination with some rules is used. The preprocessing component is used to prepare and normalize input texts. Whereas the event extraction component extracts events and the temporal information extractor is used to extract and normalize temporal expressions. Various experiments are conducted for each approach with different scenarios. The hybrid approach for event extraction component outperforms over the other two approaches and the evaluation result yields precision, recall, and Fmeasures of 97.7%, 96.3% and 96.99% respectively. The rule based approach for temporal information extraction scores a precision, recall, and F-measures of 84.6%, 89.7% and 87.1% respectively.

Show full item record