Current Research and Scholarly Interests
Statistical methods to analyze large data matrices in bioinformatics
Statistical methods to analyze large data matrices in bioinformatics
In this work we present a method for the differential analysis of gene co-expression networks and apply this method to look for large-scale transcriptional changes in aging. We derived synonymous gene co-expression networks from AGEMAP expression data for 16-month-old and 24-month-old mice. We identified a number of functional gene groups that change co-expression with age. Among these changing groups we found a trend towards declining correlation with age. In particular, we identified a modular (as opposed to uniform) decline in general correlation with age. We identified potential transcriptional mechanisms that may aid in modular correlation decline. We found that computationally identified targets of the NF-KappaB transcription factor decrease expression correlation with age. Finally, we found that genes that are prone to declining co-expression tend to be co-located on the chromosome. Our results conclude that there is a modular decline in co-expression with age in mice. They also indicate that factors relating to both chromosome domains and specific transcription factors may contribute to the decline.
View details for DOI 10.1371/journal.pgen.1000776
View details for Web of Science ID 000273469700026
View details for PubMedID 20019809
This paper takes a close look at balanced permutations, a recently developed sample reuse method with applications in bioinformatics. It turns out that balanced permutation reference distributions do not have the correct null behavior, which can be traced to their lack of a group structure. We find that they can give p-values that are too permissive to varying degrees. In particular the observed test statistic can be larger than that of all B balanced permutations of a data set with a probability much higher than 1/(B + 1), even under the null hypothesis.
View details for DOI 10.1089/cmb.2008.0144
View details for Web of Science ID 000265551400007
View details for PubMedID 19361331
We present the AGEMAP (Atlas of Gene Expression in Mouse Aging Project) gene expression database, which is a resource that catalogs changes in gene expression as a function of age in mice. The AGEMAP database includes expression changes for 8,932 genes in 16 tissues as a function of age. We found great heterogeneity in the amount of transcriptional changes with age in different tissues. Some tissues displayed large transcriptional differences in old mice, suggesting that these tissues may contribute strongly to organismal decline. Other tissues showed few or no changes in expression with age, indicating strong levels of homeostasis throughout life. Based on the pattern of age-related transcriptional changes, we found that tissues could be classified into one of three aging processes: (1) a pattern common to neural tissues, (2) a pattern for vascular tissues, and (3) a pattern for steroid-responsive tissues. We observed that different tissues age in a coordinated fashion in individual mice, such that certain mice exhibit rapid aging, whereas others exhibit slow aging for multiple tissues. Finally, we compared the transcriptional profiles for aging in mice to those from humans, flies, and worms. We found that genes involved in the electron transport chain show common age regulation in all four species, indicating that these genes may be exceptionally good markers of aging. However, we saw no overall correlation of age regulation between mice and humans, suggesting that aging processes in mice and humans may be fundamentally different.
View details for DOI 10.1371/journal.pgen.0030201
View details for Web of Science ID 000251310200024
View details for PubMedID 18081424
We analyzed expression of 81 normal muscle samples from humans of varying ages, and have identified a molecular profile for aging consisting of 250 age-regulated genes. This molecular profile correlates not only with chronological age but also with a measure of physiological age. We compared the transcriptional profile of muscle aging to previous transcriptional profiles of aging in the kidney and the brain, and found a common signature for aging in these diverse human tissues. The common aging signature consists of six genetic pathways; four pathways increase expression with age (genes in the extracellular matrix, genes involved in cell growth, genes encoding factors involved in complement activation, and genes encoding components of the cytosolic ribosome), while two pathways decrease expression with age (genes involved in chloride transport and genes encoding subunits of the mitochondrial electron transport chain). We also compared transcriptional profiles of aging in humans to those of the mouse and fly, and found that the electron transport chain pathway decreases expression with age in all three organisms, suggesting that this may be a public marker for aging across species.
View details for DOI 10.1371/journal.pgen.0020115
View details for Web of Science ID 000239494800016
View details for PubMedID 16789832
This work presents a version of the Metropolis-Hastings algorithm using quasi-Monte Carlo inputs. We prove that the method yields consistent estimates in some problems with finite state spaces and completely uniformly distributed inputs. In some numerical examples, the proposed method is much more accurate than ordinary Metropolis-Hastings sampling.
View details for DOI 10.1073/pnas.0409596102
View details for Web of Science ID 000230049500012
View details for PubMedID 15956207
In this study, we found 985 genes that change expression in the cortex and the medulla of the kidney with age. Some of the genes whose transcripts increase in abundance with age are known to be specifically expressed in immune cells, suggesting that immune surveillance or inflammation increases with age. The age-regulated genes show a similar aging profile in the cortex and the medulla, suggesting a common underlying mechanism for aging. Expression profiles of these age-regulated genes mark not only age, but also the relative health and physiology of the kidney in older individuals. Finally, the set of aging-regulated kidney genes suggests specific mechanisms and pathways that may play a role in kidney degeneration with age.
View details for DOI 10.1371/journal.pbio.0020427
View details for Web of Science ID 000226099600020
View details for PubMedID 15562319
One of the most important uses of whole-genome expression data is for the discovery of new genes with similar function to a given list of genes (the query) already known to have closely related function. We have developed an algorithm, called the gene recommender, that ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated. We used the gene recommender to find other genes coexpressed with several sets of query genes, including genes known to function in the retinoblastoma complex. Genetic experiments confirmed that one gene (JC8.6) identified by the gene recommender acts with lin-35 Rb to regulate vulval cell fates, and that another gene (wrm-1) acts antagonistically. We find that the gene recommender returns lists of genes with better precision, for fixed levels of recall, than lists generated using the C. elegans expression topomap.
View details for DOI 10.1101/gr.1125403
View details for Web of Science ID 000184530900005
View details for PubMedID 12902378
Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccharomyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.
View details for DOI 10.1073/pnas.0832373100
View details for Web of Science ID 000184222500057
View details for PubMedID 12826619