Huang, Yang. Computational approaches to identifying molecular associations in high-throughput biological data. Retrieved from https://doi.org/doi:10.7282/T3CZ37JC
DescriptionBiomolecules, such as proteins and nucleic acids, are the building blocks of living organisms.
Their complex interactions and associations are the key to understanding the basic mechanisms of life. Recently, high-throughput biological experiments allow to study thousands of biomolecules simultaneously, yielding a large amount of data that may reveal essential molecular associations. The work in this dissertation will focus on analyzing protein-protein interaction and gene-expression data obtained from these experiments.
To identify temporal associations among proteins in pathways, the temporal order, by which proteins enter and exit the pathways, is needed. For this purpose, an interval graph model is presented for molecular pathways using protein-protein interactions.
Based on this model, a tool, XRONOS, is developed to compute possible orderings of
proteins in pathways. XRONOS is then applied to the yeast ribosome assembly pathway and develop several tests based on graph theory, statistics and biological knowledge to validate the computed orderings.
In a gene-expression matrix, rows correspond to genes and columns correspond to measuring conditions. An association coefficient is defined for a pair of genes in a discretized gene-expression matrix. These association coefficients are then applied to define dissimilarity measure between two discretized gene-expression matrices. We are able to effectively compute the dissimilarity between gene-expression matrices using concept lattices. With the dissimilarity measure, a tool, LABSTER, is developed to cluster a set of gene-expression matrices for class discovery. LABSTER is successfully used on simulation and clinical gene-expression data sets to discover different cell phenotypes.
Since concept lattices prove useful in many areas and the size of them can be exponential
to the input, it has become important to construct concept lattices efficiently.
An algorithm is designed with delay-time complexity O(|G||M|) given an input binary matrix of size |G||M|. Based on the characterization of irregular concepts, the algorithm improves the previous best delay-time complexity O(|G||M|2). In addition, a method to represent concept lattices in a compact representation is proposed. This method can save storage space compared to the full representation normally used. The algorithm for the full representation of concept lattices is modified to generate a compact
representation.