Abstract:
Menarche, the first occurrence of menstruation in girls, is an important milestone in the development of female adolescent. Time to menarche is the duration from the birth of an individual to the occurrence of the first menstruation cycle. Often, such time to event data are clustered (correlated) based on geographic locations. In the standard survival models the covariate effect and standard errors are estimated with the assumption that event times within the same cluster are independent of each other which leads to invalid results due to the ignored correlation and or heterogeneity in the data. Hence, in this thesis we applied various clustered or multivariate survival models in the analysis of age at menarche. Methods: In this thesis, parametric frailty models, namely exponential, Weibull, lognormal, and loglogistic baseline hazards along with gamma, inverse Gaussian, lognormal and positive stable frailty distributions were used and the selected parametric frailty model was compared with the commonly used shared gamma frailty. AIC, model adequacy and standardized variability of coefficients were used in the comparison of various clustered survival models. Results: The median age at menarche was about 14 years. The estimated heterogeneity parameter on menarcheal age across villages found to be significant except for exponential based frailty models. Comparison output shows that loglogistic-gamma frailty model has smallest AIC and has a better fit to the age at menarche data. Mother’s education level, house hold income, BMI for age and height for age are important prognostic factors of age at menarche. Conclusions: The log logistic-gamma frailty model found to be a good time to event model that fits the data better than other frailty models used in this thesis. The estimated heterogeneity parameter found to be significant indicating there is clustering/heterogeneity in timing of menarche across villages of Jimma zone. Hence, it is appropriate to employ a multivariate survival model that take into account the clustering or heterogeneity in the data.