Abstract:
Globally, there is a significant health burden associated with respiratory diseases for
instance, Chronic obstructive pulmonary disease (COPD) affects over 200 million
individuals, asthma impacts 235 million people, and tuberculosis (TB) afflicts 8.7 million
annually. Detecting these diseases at an early stage is crucial for effective treatment.
However, the availability of experts, especially in developing nations, is often limited.
Therefore, the main objective of this research is to develop a self-learning expert system for
respiratory diseases diagnosis and treatment. The research utilized a design science research
approach to gain insights into the challenges within the field and create a model as a
solution. In this study, a primary focus was on employing classification techniques within
data mining tasks to extract representative cases from the collected data. To determine the
optimal model and select the most effective data mining classification algorithm, three
experiments were carried out using J48, PART and JRIP classification algorithms. The data
mining algorithm achieved accuracy of 94.5%, 92.5% and 90% for PART, J48 and JRIP
respectively. This system aims to identify and offer insights into the diagnosis and treatment
of common respiratory diseases. In order to develop the model the researcher collected both
implicit and explicit knowledge from various sources, including domain experts, medical
literature, research papers, clinical guidelines and patient records. Then organized the
acquired knowledge, selected the most important attributes and discarded unnecessary ones
by analysing the patient records. The relevant information was transferred to an Excel sheet
and saved as a CSV file. The researcher utilized the WEKA 3.9 data mining software to pre
process and classify the data using algorithms such as J48, PART, and JRIP. The
performance of these algorithms was evaluated using a 10-fold cross-validation method and
an 80/20% data split. Finally, the PART rules were represented in a structured format
suitable for computer-based systems using rule-based representation techniques and
integrated with a node to run the main interface. The prototype is evaluated using system
testing and user acceptance testing. System testing performed in terms of recall, precision and
F-measure registered 90.9%, 90.9% and 90.5% respectively. User acceptance testing also
performed by involving domain experts and an average of 84% acceptance was achieved. To
enhance the widespread adoption of knowledge-based systems by the general public it is
recommended to add more local languages to avoid language barriers.