Jimma University Open access Institutional Repository

Character Identi cation in Multiparty Dialogues; Identifying Mentions of the Characters from Amharic TV show Transcripts

Show simple item record

dc.contributor.author Dawod Yimer
dc.contributor.author Wondwesen Mulugeta
dc.contributor.author Efrem Tadese
dc.date.accessioned 2021-02-05T12:17:56Z
dc.date.available 2021-02-05T12:17:56Z
dc.date.issued 2019
dc.identifier.uri https://repository.ju.edu.et//handle/123456789/5393
dc.description.abstract Character Identi cation is an entity linking task that nds the global entity of each personal mention on multiparty dialogue. In this work, we combined coreference resolution and entity linking to accomplish a more complicated task, which is identifying the characters in multiparty dialogue. The personal mentions are detected from nominals referring to certain characters in a show, and the entities are collected from the list of all characters in those series of the show. To tackle this task, we introduce a novel coreference resolution algorithm that selectively create clusters to handle both singular and plural mentions, and also a convolutional neural network based entity linking model that jointly handles both types of mentions through multitask learning. Our approach for tackling this problem has been to model this task as co-reference resolution followed by entity linking for assigning character labels to clusters of named entity mentions. Using an agglomerative convolutional neural network that takes groups of features and learns mention and mention-pair embeddings vastly improved the cluster purity scores for coreference resolution. By integrating the two basic tasks deep learning model was designed to identify the global personal mentions that refers a human characters. Adjusted evaluation metrics are proposed for these tasks as well to handle the uniqueness of mentions. Three basic evaluation metrics such as Bcube, BLANC and Ceafe are practiced and each experiment shows that the new coreference resolution and entity linking models signi cantly outperform on the model developed. To the best of our knowledge, this is the rst time that dialogue mentions are thoroughly analyzed for resolution tasks. Transcripts of TV shows are collected as corpus and manually annotated with mentions by linguistically motivated rules. These mentions are manually linked to their referents. The dataset used in this work is based on [10] and [15] format, and consists of dialogue from Two Amharic TV shows: Gemena and Sewlesew in text (transcribed) form. So that, 25 episodes of the shows are annotated, which comprises a total of 164 dialogues, 155 scenes, 1840 mentions, and 146 entities. We use common evaluation metrics to evaluated our models using those transcribed dataset, and achieve a character identi cation accuracy of 80.65% and an F1-score of 77.2% on the held-out episodes of the annotated test datasets, and Accuracy of 87.2% and F1-score of 63.2% on the overall dataset used in this research work. en_US
dc.language.iso en en_US
dc.subject Character Identification from Amharic multiparty dialogue en_US
dc.subject Coreference resolution en_US
dc.subject Entity Linking en_US
dc.subject Deep learning approach for entity linking en_US
dc.subject Convolutional neural network approach for character Identifica en_US
dc.title Character Identi cation in Multiparty Dialogues; Identifying Mentions of the Characters from Amharic TV show Transcripts en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IR


Browse

My Account