Abstract:
Now a day, different organizations use database with local language written contents to manage
their work. These databases have huge information in an electronic format, and to access and
manipulate these information users expected to know the SQL. Also, using SQL to access and
manipulate the information written in local language is very difficult and tedious. So, instead of
knowing and using of the SQL, it is better to use natural language for users to access and
manipulate the contents of the database. Because using natural language is simple and
comfortable. And also, it is good to form conjunctions and negations query simply. Because of
the simplicity and comfortablity of the natural language for ordinary users many researches have
been carried out on natural language interfaces since 1970. Therefore, that’s why Tigrinya
language interface to database had been proposed.
The database contents have been accessed and manipulated using the developed Tigrinya
language interface to database (TLIDB) prototype without the knowledge of SQL. To carry out
this, first the input Tigrinya sentences have been translated into the corresponding SQL
statements and further the SQL statements were executed in the database. The TLIDB was
designed and developed using a robust and effective approach called neural machine translation.
The encoder-decoder long short term memory was used for the translation of the input Tigrinya
sentence to corresponding SQL statements. In the sequence to sequence problems the encoderdecoder long short-term memory is a good technique. Also, word embedding technique was used
to estimate the similarity of words and to have a dense representation. This solved the sparse data
problem with the traditional approaches.
The developed TLIDB prototype was evaluated on healthcare database that has patients, diseases
and employees table. The record of diseases was prepared with health professionals. This
prototype handles list query, conditional queries, aggregate functions, complex queries (join,
union), update, delete and etc. To develop the prototype 6338 sentences were prepared with their
corresponding SQL statements. The model was trained with 80% and tested with 20% of the
dataset. This was done using the percentage split evaluation technique. Since, in percentage split
evaluation technique the model has been evaluated with the data that were not included during
the training. After the model had been evaluated, above 98.5% overall accuracy has been scored.