Gene regulation is a complex mechanism that controls the spatial and temporal expression of genes in a living cell. My dissertation studies focus on two problems. First, tissue-specific gene expression prediction from DNA sequence and chromatin state, and second, the accurate discovery of small over-represented regulatory circuits in gene regulatory networks.
Tissue-specific gene expression prediction is a complex process. In this dissertation, I introduce a generalizable machine learning method to study tissue-specific gene expression prediction in the model plant Arabidopsis thaliana. The results show that the transcription factor binding sites and chromatin states accurately characterize the tissue of expression.
In the context of gene regulatory networks, I introduce IndeCut as a very first method that evaluates the performance of network discovery algorithms. Genomic networks represent a complex map of molecular interactions which are descriptive of the biological processes occurring in living cells. Identifying the small over-represented circuitry patterns in these networks helps generate hypotheses about the functional basis of such complex processes. IndeCut aids accurate detection of such over-represented patterns.