Investigating the possibility of developing Tigrinya language interface to database

Hagos Hailemaryam; Getachew Mamo; Teferi Kebebew

dc.contributor.author	Hagos Hailemaryam
dc.contributor.author	Getachew Mamo
dc.contributor.author	Teferi Kebebew
dc.date.accessioned	2021-02-11T07:27:51Z
dc.date.available	2021-02-11T07:27:51Z
dc.date.issued	2020-01
dc.identifier.uri	https://repository.ju.edu.et//handle/123456789/5529
dc.description.abstract	Now a day, different organizations use database with local language written contents to manage their work. These databases have huge information in an electronic format, and to access and manipulate these information users expected to know the SQL. Also, using SQL to access and manipulate the information written in local language is very difficult and tedious. So, instead of knowing and using of the SQL, it is better to use natural language for users to access and manipulate the contents of the database. Because using natural language is simple and comfortable. And also, it is good to form conjunctions and negations query simply. Because of the simplicity and comfortablity of the natural language for ordinary users many researches have been carried out on natural language interfaces since 1970. Therefore, that’s why Tigrinya language interface to database had been proposed. The database contents have been accessed and manipulated using the developed Tigrinya language interface to database (TLIDB) prototype without the knowledge of SQL. To carry out this, first the input Tigrinya sentences have been translated into the corresponding SQL statements and further the SQL statements were executed in the database. The TLIDB was designed and developed using a robust and effective approach called neural machine translation. The encoder-decoder long short term memory was used for the translation of the input Tigrinya sentence to corresponding SQL statements. In the sequence to sequence problems the encoderdecoder long short-term memory is a good technique. Also, word embedding technique was used to estimate the similarity of words and to have a dense representation. This solved the sparse data problem with the traditional approaches. The developed TLIDB prototype was evaluated on healthcare database that has patients, diseases and employees table. The record of diseases was prepared with health professionals. This prototype handles list query, conditional queries, aggregate functions, complex queries (join, union), update, delete and etc. To develop the prototype 6338 sentences were prepared with their corresponding SQL statements. The model was trained with 80% and tested with 20% of the dataset. This was done using the percentage split evaluation technique. Since, in percentage split evaluation technique the model has been evaluated with the data that were not included during the training. After the model had been evaluated, above 98.5% overall accuracy has been scored.	en_US
dc.language.iso	en	en_US
dc.subject	TLIDB	en_US
dc.subject	Natural Language Interface to Database	en_US
dc.subject	Natural Language Processing	en_US
dc.title	Investigating the possibility of developing Tigrinya language interface to database	en_US
dc.type	Thesis	en_US