Abstract:
The healthcare sector is closely associated with human interaction, and it seems that
counterintuitive conversational AI applications like Chatbots are more prevalent. There are some
established maternal health care assistance virtual agents (conversational AI bot) as a global
standard. However, they are not fully compatible with various countries in case of language
variation and contexts of user’s living standards. In this study, Amharic text based chatbot
system is proposed for assisting maternity using ensemble learning technique so as to enable
pregnant women assess the maternal condition in a more human like way. The proposed
system’s architecture is designed to reply relevant answers to the user’s text input. It is designed
based on accepting user’s text input via GUI chat interface and then text preprocessing (text
normalization and cleaning, tokenization and stop word removing) and word embedding (bag-of words) tasks are applied. Then ensemble model makes intent classification or prediction, and
finally retrieves a text (response). Investigation is carried out to develop the ensemble model for
intent classification. The proposed single MLP model achieved about 100% accuracy on the
training dataset and about 67.1% accuracy on the test dataset. Repeating evaluation of the model,
however, revealed that the model has a variance in its prediction. The average of the sample is
found about 67.2% with a standard deviation of about 1.3%. Therefore, model averaging of
ensemble learning technique is applied to both reduce the variance of the model and possibly
reduce the generalization error of the model. A sensitivity analysis of the number of ensemble
members is investigated to know how it impacts test accuracy. It is found that the performance
improves to about five members, and the average performance of a five-member ensemble on the
dataset is 67.3%. This is very close to the average of 67.2% seen for the single model. The
important difference is the standard deviation shrinking from 1.3% for a single model to 0.6%
with a five-member ensemble. This implies that averaging the same model trained on the same
dataset gives us a spread for improved reliability, a property often highly desired in a final model
to be used operationally. As a future work, investigation needs to consider Amharic lemmatizer
or stemmer as text preprocessing, and getting more and better quality data.