Mi CIMAT Alertas Editar Perfil

Por favor, use este identificador para citar o enlazar este ítem: http://cimat.repositorioinstitucional.mx/jspui/handle/1008/727

Título :	PROBLEMS IN STATISTICAL GENETICS: CLASSIFICATION AND TESTING FOR NETWORK CHANGES
Autor:	ADOLPHUS WAGALA
Nivel de acceso:	Acceso Abierto
Licencia:	Atribución-NoComercial
Materia:	INTEGRACIÓN ESTADÍSTICA DE DATOS MOLECULARES
Resumen o descripción:	his thesis addresses the problems of classification of microarray data and the statistical integration of molecular data to test for network changes. For the classification problem, we consider the unpreprocessed and preprocessed microarray data sets. We implement an extension of the partial least squares generalized linear regression (PLSGLR) Bastien et al. (2005) achieved by combining it with the logistic regression to get partial least squares generalized linear regression-logistic regression model (PLSGLR-log) and also with the linear discriminant analysis to get the partial least squares generalized linear regression-linear discriminant analysis denoted by (PLSGLRDA). These two classification methodologies are then compared with the classical methodologies namely the k-nearest neighbours (KNN), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), ridge partial least squares (RPLS), the support vector machine (SVM). Furthermore, we implement a recent algorithm by Dalmau et al. (2015) known as kernel multilogit algorithm (KMA). The results indicate that for the noisy unpreprocessed data, the KMA emerged as the clear “winner” based on based on their low misclassification error rates. For the preprocessed normalized data, there was no clear “winner” since there was no single method that performed outstandingly better than the rest. The KNN emerged as a clear “loser” since it consistently had a relatively higher rate of misclassification both when applied to the un-preprocessed and preprocessed data sets. The statistical integration of molecular data to test for network changes considers an experiment involving two main groups namely the healthy (H) and acute rheumatic fever (ARF) subjects. For each group, each specimen is divided in two portions so that one portion is group A streptococcus (GAS) stimulated while the other is unstimulated so that we end up with four sub groups: Healthy GAS stimulated, Healthy unstimulated, ARF-GAS stimulated and ARF unstimulated. As a result, we have dependence within the groups and independence between the groups. For all the groups, p genes are measured for expression. We identify a prior network from the curated literature and online sources. The genes considered in the experiment are then matched with the ones in the prior network so that we reduce the prior network to only the genes that are found in the experimental data. We then construct two networks, one for the healthy and the
Fecha de publicación :	07-03-2018
Tipo de publicación :	Trabajo de grado, doctorado
Área de conocimiento:	OTRAS
Versión de la publicación:	Versión aceptada
Versión de la publicación:	acceptedVersion - Versión aceptada
Aparece en las colecciones:	Tesis del CIMAT

Cargar archivos:

Fichero	Descripción	Tamaño	Formato
TE 658.pdf		2.45 MB	Adobe PDF	Visualizar/Abrir