Abstract:
The drastic increase of large volume of data on the web becomes cumbersome to get relevant
information. To tackle this problem a lot of information extraction tasks have been done from the
literature background. Event and Temporal information extraction is one of information extraction tasks,
which helps to get important events from large set of texts with their chronological order and answers
the question of what happened on a certain situation as well as when does it happen. Unlike other
information tasks like entity extraction research needs felt for event and temporal information extraction
especially for Amharic still there is no work on this particular IE task.
As the first comprehensive work we designed a model on event and temporal information extraction
from Amharic text. The model is comprised of different components including common preprocessing,
learning and classification, event extraction, temporal information extraction. To develop the proposed
model we used different approaches for each tasks. For event extraction component we used a machine
learning classifier but the classifier fails to detect deverbal events. To resolve the machine learning
classifier limitation of missing deverbal entities due to their ambiguities we used rule based approach
using syntactic features such as POS, morphological analyzer, and list of gazetteers. In practice it‟s
difficult to stay within the boundary of single event extraction method. So as both approaches have
advantages and disadvantages combining those results to get the advantage of the machine learning
classifier and the rule based approach advantage we developed hybrid approach for event extraction. For
temporal information extraction component regular expression, list of temporal gazetteers in
combination with some rules is used. The preprocessing component is used to prepare and normalize
input texts. Whereas the event extraction component extracts events and the temporal information
extractor is used to extract and normalize temporal expressions. Various experiments are conducted for
each approach with different scenarios. The hybrid approach for event extraction component
outperforms over the other two approaches and the evaluation result yields precision, recall, and Fmeasures of 97.7%, 96.3% and 96.99% respectively. The rule based approach for temporal information
extraction scores a precision, recall, and F-measures of 84.6%, 89.7% and 87.1% respectively.