Abstract:
Amharic text-to-image generation model using a conditional generative adversarial network
(CGAN) is a novel concept that can be made possible by advances in deep learning. The aim of
this study is to develop a model for Amharic text-to-image generation using CGAN algorithm. This
study employed Experimental research design as study method. For this research, 2575 images of
clothes and shoes were acquired, and the corresponding Amharic texts were written manually.
For Amharic text preprocessing, stop word removal, punctuation mark removal, tokenizing the
text, and creating word embedding using Word2Vec have been done. For image data
preprocessing, noise removal, image segmentation, image resizing, normalizing, and converting
to numpy arrays have been done. 80% of the paired Amharic text with the corresponding images
was used to train the generator and discriminator networks for 1000 epochs and 32 batch sizes of
data. In training, the generator network achieved 100% accuracy, and the discriminator achieved
40–50% accuracy, but the discriminator was unable to distinguish the generated images. Finally,
the generator network trained on the training data has to be tested with the testing data to produce
fake images to be compared with the tested real images. The generator achieved a Fréchet
inception distance score of 4.99e+108 and an inception score of 417.2, which indicates the
quantitative measure of the generated image quality. These numbers indicate that the generated
images by the trained generator are not comparable with the real images. Training both the
generator and discriminator at the updated values of parameters is much better than the default
values of parameters as it is seen in the testing results. It is possible to develop a perfect model for
Amharic text image generation with enough dataset, enough computational resources, and by
using other variants of CGAN.