Assistant Professor, Institute for Immunity, Transplantation and Infection (2014 - Present)
Active pulmonary tuberculosis is difficult to diagnose and treatment response is difficult to effectively monitor. A WHO consensus statement has called for new non-sputum diagnostics. The aim of this study was to use an integrated multicohort analysis of samples from publically available datasets to derive a diagnostic gene set in the peripheral blood of patients with active tuberculosis.We searched two public gene expression microarray repositories and retained datasets that examined clinical cohorts of active pulmonary tuberculosis infection in whole blood. We compared gene expression in patients with either latent tuberculosis or other diseases versus patients with active tuberculosis using our validated multicohort analysis framework. Three datasets were used as discovery datasets and meta-analytical methods were used to assess gene effects in these cohorts. We then validated the diagnostic capacity of the three gene set in the remaining 11 datasets.A total of 14 datasets containing 2572 samples from 10 countries from both adult and paediatric patients were included in the analysis. Of these, three datasets (N=1023) were used to discover a set of three genes (GBP5, DUSP3, and KLF2) that are highly diagnostic for active tuberculosis. We validated the diagnostic power of the three gene set to separate active tuberculosis from healthy controls (global area under the ROC curve (AUC) 0·90 [95% CI 0·85-0·95]), latent tuberculosis (0·88 [0·84-0·92]), and other diseases (0·84 [0·80-0·95]) in eight independent datasets composed of both children and adults from ten countries. Expression of the three-gene set was not confounded by HIV infection status, bacterial drug resistance, or BCG vaccination. Furthermore, in four additional cohorts, we showed that the tuberculosis score declined during treatment of patients with active tuberculosis.Overall, our integrated multicohort analysis yielded a three-gene set in whole blood that is robustly diagnostic for active tuberculosis, that was validated in multiple independent cohorts, and that has potential clinical application for diagnosis and monitoring treatment response. Prospective laboratory validation will be required before it can be used in a clinical setting.National Institute of Allergy and Infectious Diseases, National Library of Medicine, the Stanford Child Health Research Institute, the Society for University Surgeons, and the Bill and Melinda Gates Foundation.
View details for DOI 10.1016/S2213-2600(16)00048-5
View details for PubMedID 26907218
Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections. First, we identified a common host signature across different respiratory viral infections that could distinguish (1) individuals with viral infections from healthy controls and from those with bacterial infections, and (2) symptomatic from asymptomatic subjects prior to symptom onset in challenge studies. Second, we identified an influenza-specific host response signature that (1) could distinguish influenza-infected samples from those with bacterial and other respiratory viral infections, (2) was a diagnostic and prognostic marker in influenza-pneumonia patients and influenza challenge studies, and (3) was predictive of response to influenza vaccine. Our results have applications in the diagnosis, prognosis, and identification of drug targets in viral infections.
View details for DOI 10.1016/j.immuni.2015.11.003
View details for Web of Science ID 000366846600022
View details for PubMedID 26682989
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.
View details for DOI 10.1093/jamia/ocv048
View details for PubMedID 26112029
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal human cancers and shows resistance to any therapeutic strategy used. Here we tested small-molecule inhibitors targeting chromatin regulators as possible therapeutic agents in PDAC. We show that JQ1, an inhibitor of the bromodomain and extraterminal (BET) family of proteins, suppresses PDAC development in mice by inhibiting both MYC activity and inflammatory signals. The histone deacetylase (HDAC) inhibitor SAHA synergizes with JQ1 to augment cell death and more potently suppress advanced PDAC. Finally, using a CRISPR-Cas9-based method for gene editing directly in the mouse adult pancreas, we show that de-repression of p57 (also known as KIP2 or CDKN1C) upon combined BET and HDAC inhibition is required for the induction of combination therapy-induced cell death in PDAC. SAHA is approved for human use, and molecules similar to JQ1 are being tested in clinical trials. Thus, these studies identify a promising epigenetic-based therapeutic strategy that may be rapidly implemented in fatal human tumors.
View details for DOI 10.1038/nm.3952
View details for PubMedID 26390243
Deregulation of lysine methylation signalling has emerged as a common aetiological factor in cancer pathogenesis, with inhibitors of several histone lysine methyltransferases (KMTs) being developed as chemotherapeutics. The largely cytoplasmic KMT SMYD3 (SET and MYND domain containing protein 3) is overexpressed in numerous human tumours. However, the molecular mechanism by which SMYD3 regulates cancer pathways and its relationship to tumorigenesis in vivo are largely unknown. Here we show that methylation of MAP3K2 by SMYD3 increases MAP kinase signalling and promotes the formation of Ras-driven carcinomas. Using mouse models for pancreatic ductal adenocarcinoma and lung adenocarcinoma, we found that abrogating SMYD3 catalytic activity inhibits tumour development in response to oncogenic Ras. We used protein array technology to identify the MAP3K2 kinase as a target of SMYD3. In cancer cell lines, SMYD3-mediated methylation of MAP3K2 at lysine 260 potentiates activation of the Ras/Raf/MEK/ERK signalling module and SMYD3 depletion synergizes with a MEK inhibitor to block Ras-driven tumorigenesis. Finally, the PP2A phosphatase complex, a key negative regulator of the MAP kinase pathway, binds to MAP3K2 and this interaction is blocked by methylation. Together, our results elucidate a new role for lysine methylation in integrating cytoplasmic kinase-signalling cascades and establish a pivotal role for SMYD3 in the regulation of oncogenic Ras signalling.
View details for DOI 10.1038/nature13320
View details for PubMedID 24847881
Lung cancer remains the most common cause of cancer-related death worldwide and it continues to lack effective treatment. The increasingly large and diverse public databases of lung cancer gene expression constitute a rich source of candidate oncogenic drivers and therapeutic targets. To define novel targets for lung adenocarcinoma, we conducted a large-scale meta-analysis of genes specifically overexpressed in adenocarcinoma. We identified an 11-gene signature that was overexpressed consistently in adenocarcinoma specimens relative to normal lung tissue. Six genes in this signature were specifically overexpressed in adenocarcinoma relative to other subtypes of non-small cell lung cancer (NSCLC). Among these genes was the little studied protein tyrosine kinase PTK7. Immunohistochemical analysis confirmed that PTK7 is highly expressed in primary adenocarcinoma patient samples. RNA interference-mediated attenuation of PTK7 decreased cell viability and increased apoptosis in a subset of adenocarcinoma cell lines. Further, loss of PTK7 activated the MKK7-JNK stress response pathway and impaired tumor growth in xenotransplantation assays. Our work defines PTK7 as a highly and specifically expressed gene in adenocarcinoma and a potential therapeutic target in this subset of NSCLC. Cancer Res; 74(10); 2892-902. ©2014 AACR.
View details for DOI 10.1158/0008-5472.CAN-13-2775
View details for Web of Science ID 000336720700024
View details for PubMedID 24654231
We propose and discuss a method for doing gene expression meta-analysis (multiple datasets) across multiplex measurement modalities measuring the expression of many genes simultaneously (e.g. microarrays and RNAseq) using external control samples and a method of heterogeneity detection to identify and filter on comparable gene expression measurements. We demonstrate this approach on publicly available gene expression datasets from samples of medulloblastoma and normal cerebellar tissue and identify some potential new targets in the treatment of medulloblastoma.
View details for PubMedID 24297537
IntroductionNeurodegenerative diseases share common pathologic features including neuroinflammation, mitochondrial dysfunction and protein aggregation, suggesting common underlying mechanisms of neurodegeneration. We undertook a meta-analysis of public gene expression data for neurodegenerative diseases to identify a common transcriptional signature of neurodegeneration.ResultsUsing 1,270 post-mortem central nervous system tissue samples from 13 patient cohorts covering four neurodegenerative diseases, we identified 243 differentially expressed genes, which were similarly dysregulated in 15 additional patient cohorts of 205 samples including seven neurodegenerative diseases. This gene signature correlated with histologic disease severity. Metallothioneins featured prominently among differentially expressed genes, and functional pathway analysis identified specific convergent themes of dysregulation. MetaCore network analyses revealed various novel candidate hub genes (e.g. STAU2). Genes associated with M1-polarized macrophages and reactive astrocytes were strongly enriched in the meta-analysis data. Evaluation of genes enriched in neurons revealed 70 down-regulated genes, over half not previously associated with neurodegeneration. Comparison with aging brain data (3 patient cohorts, 221 samples) revealed 53 of these to be unique to neurodegenerative disease, many of which are strong candidates to be important in neuropathogenesis (e.g. NDN, NAP1L2). ENCODE ChIP-seq analysis predicted common upstream transcriptional regulators not associated with normal aging (REST, RBBP5, SIN3A, SP2, YY1, ZNF143, IKZF1). Finally, we removed genes common to neurodegeneration from disease-specific gene signatures, revealing uniquely robust immune response and JAK-STAT signaling in amyotrophic lateral sclerosis.ConclusionsOur results implicate pervasive bioenergetic deficits, M1-type microglial activation and gliosis as unifying themes of neurodegeneration, and identify numerous novel genes associated with neurodegenerative processes.
View details for DOI 10.1186/s40478-014-0093-y
View details for PubMedID 25187168
Small cell lung cancer (SCLC) is an aggressive neuroendocrine subtype of lung cancer with high mortality. We used a systematic drug repositioning bioinformatics approach querying a large compendium of gene expression profiles to identify candidate U.S. Food and Drug Administration (FDA)-approved drugs to treat SCLC. We found that tricyclic antidepressants and related molecules potently induce apoptosis in both chemonaïve and chemoresistant SCLC cells in culture, in mouse and human SCLC tumors transplanted into immunocompromised mice, and in endogenous tumors from a mouse model for human SCLC. The candidate drugs activate stress pathways and induce cell death in SCLC cells, at least in part by disrupting autocrine survival signals involving neurotransmitters and their G protein-coupled receptors. The candidate drugs inhibit the growth of other neuroendocrine tumors, including pancreatic neuroendocrine tumors and Merkel cell carcinoma. These experiments identify novel targeted strategies that can be rapidly evaluated in patients with neuroendocrine tumors through the repurposing of approved drugs.Our work shows the power of bioinformatics-based drug approaches to rapidly repurpose FDA-approved drugs and identifies a novel class of molecules to treat patients with SCLC, a cancer for which no effective novel systemic treatments have been identified in several decades. In addition, our experiments highlight the importance of novel autocrine mechanisms in promoting the growth of neuroendocrine tumor cells.
View details for DOI 10.1158/2159-8290.CD-13-0183
View details for Web of Science ID 000328257500023
View details for PubMedID 24078773
Using meta-analysis of eight independent transplant datasets (236 graft biopsy samples) from four organs, we identified a common rejection module (CRM) consisting of 11 genes that were significantly overexpressed in acute rejection (AR) across all transplanted organs. The CRM genes could diagnose AR with high specificity and sensitivity in three additional independent cohorts (794 samples). In another two independent cohorts (151 renal transplant biopsies), the CRM genes correlated with the extent of graft injury and predicted future injury to a graft using protocol biopsies. Inferred drug mechanisms from the literature suggested that two FDA-approved drugs (atorvastatin and dasatinib), approved for nontransplant indications, could regulate specific CRM genes and reduce the number of graft-infiltrating cells during AR. We treated mice with HLA-mismatched mouse cardiac transplant with atorvastatin and dasatinib and showed reduction of the CRM genes, significant reduction of graft-infiltrating cells, and extended graft survival. We further validated the beneficial effect of atorvastatin on graft survival by retrospective analysis of electronic medical records of a single-center cohort of 2,515 renal transplant patients followed for up to 22 yr. In conclusion, we identified a CRM in transplantation that provides new opportunities for diagnosis, drug repositioning, and rational drug design.
View details for DOI 10.1084/jem.20122709
View details for Web of Science ID 000325997600007
View details for PubMedID 24127489
Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.
View details for DOI 10.1158/0008-5472.CAN-12-1097
View details for Web of Science ID 000311141300012
View details for PubMedID 22962265
Monitoring of renal graft status through peripheral blood (PB) rather than invasive biopsy is important as it will lessen the risk of infection and other stresses, while reducing the costs of rejection diagnosis. Blood gene biomarker panels were discovered by microarrays at a single center and subsequently validated and cross-validated by QPCR in the NIH SNSO1 randomized study from 12 US pediatric transplant programs. A total of 367 unique human PB samples, each paired with a graft biopsy for centralized, blinded phenotype classification, were analyzed (115 acute rejection (AR), 180 stable and 72 other causes of graft injury). Of the differentially expressed genes by microarray, Q-PCR analysis of a five gene-set (DUSP1, PBEF1, PSEN1, MAPK9 and NKTR) classified AR with high accuracy. A logistic regression model was built on independent training-set (n = 47) and validated on independent test-set (n = 198)samples, discriminating AR from STA with 91% sensitivity and 94% specificity and AR from all other non-AR phenotypes with 91% sensitivity and 90% specificity. The 5-gene set can diagnose AR potentially avoiding the need for invasive renal biopsy. These data support the conduct of a prospective study to validate the clinical predictive utility of this diagnostic tool.
View details for DOI 10.1111/j.1600-6143.2012.04253.x
View details for Web of Science ID 000309180000018
View details for PubMedID 23009139
Chronic allograft injury (CAI) results from a humoral response to mismatches in immunogenic epitopes between the donor and recipient. Although alloantibodies against HLA antigens contribute to the pathogenesis of CAI, alloantibodies against non-HLA antigens likely contribute as well. Here, we used high-density protein arrays to identify non-HLA antibodies in CAI and subsequently validated a subset in a cohort of 172 serum samples collected serially post-transplantation. There were 38 de novo non-HLA antibodies that significantly associated with the development of CAI (P<0.01) on protocol post-transplant biopsies, with enrichment of their corresponding antigens in the renal cortex. Baseline levels of preformed antibodies to MIG (also called CXCL9), ITAC (also called CXCL11), IFN-?, and glial-derived neurotrophic factor positively correlated with histologic injury at 24 months. Measuring levels of these four antibodies could help clinicians predict the development of CAI with >80% sensitivity and 100% specificity. In conclusion, pretransplant serum levels of a defined panel of alloantibodies targeting non-HLA immunogenic antigens associate with histologic CAI in the post-transplant period. Validation in a larger, prospective transplant cohort may lead to a noninvasive method to predict and monitor for CAI.
View details for DOI 10.1681/ASN.2011060596
View details for Web of Science ID 000302333300022
View details for PubMedID 22302197
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.
View details for DOI 10.1371/journal.pcbi.1002375
View details for Web of Science ID 000300729900019
View details for PubMedID 22383865
IgG commonly co-exists with IgA in the glomerular mesangium of patients with IgA nephropathy (IgAN) with unclear clinical relevance. Autoantibody (autoAb) biomarkers to detect and track progression of IgAN are an unmet clinical need. The objective of the study was to identify IgA-specific autoAbs specific to IgAN.High-density protein microarrays were evaluated IgG autoAbs in the serum of IgAN patients (n = 22) and controls (n = 10). Clinical parameters, including annual GFR and urine protein measurements, were collected on all patients over 5 years. Bioinformatic data analysis was performed to select targets for further validation by immunohistochemistry (IHC).One hundred seventeen (1.4%) specific antibodies were increased in IgAN. Among the most significant were the autoAb to the Ig family of proteins. IgAN-specific autoAbs (approximately 50%) were mounted against proteins predominantly expressed in glomeruli and tubules, and selected candidates were verified by IHC. Receiver operating characteristic analysis of our study demonstrated that IgG autoAb levels (matriline 2, ubiquitin-conjugating enzyme E2W, DEAD box protein, and protein kinase D1) might be used in combination with 24-hour proteinuria to improve prediction of the progression of IgAN (area under the curve = 0.86, P = 0.02).IgAN is associated with elevated IgG autoAbs to multiple proteins in the kidney. This first analysis of the repertoire of autoAbs in IgAN identifies novel, immunogenic protein targets that are highly expressed in the kidney glomerulus and tubules that may bear relevance in the pathogenesis and progression of IgAN.
View details for DOI 10.2215/CJN.04600511
View details for Web of Science ID 000297948900009
View details for PubMedID 22157707
The degree of progressive chronic histological damage is associated with long-term renal allograft survival. In order to identify promising molecular targets for timely intervention, we examined renal allograft protocol and indication biopsies from 120 low-risk pediatric and adolescent recipients by whole-genome microarray expression profiling. In data-driven analysis, we found a highly regulated pattern of adaptive and innate immune gene expression that correlated with established or ongoing histological chronic injury, and also with development of future chronic histological damage, even in histologically pristine kidneys. Hence, histologically unrecognized immunological injury at a molecular level sets the stage for the development of chronic tissue injury, while the same molecular response is accentuated during established and worsening chronic allograft damage. Irrespective of the hypothesized immune or nonimmune trigger for chronic allograft injury, a highly orchestrated regulation of innate and adaptive immune responses was found in the graft at the molecular level. This occurred months before histologic lesions appear, and quantitatively below the diagnostic threshold of classic T-cell or antibody-mediated rejection. Thus, measurement of specific immune gene expression in protocol biopsies may be warranted to predict the development of subsequent chronic injury in histologically quiescent grafts and as a means to titrate immunosuppressive therapy.
View details for DOI 10.1038/ki.2011.245
View details for Web of Science ID 000297541900014
View details for PubMedID 21881554
Technological advances in molecular and in silico research have enabled significant progress towards personalized transplantation medicine. It is now possible to conduct comprehensive biomarker development studies of transplant organ pathologies, correlating genomic, transcriptomic and proteomic information from donor and recipient with clinical and histological phenotypes. Translation of these advances to the clinical setting will allow assessment of an individual patient's risk of allograft damage or accommodation. Transplantation biomarkers are needed for active monitoring of immunosuppression, to reduce patient morbidity, and to improve long-term allograft function and life expectancy. Here, we highlight recent pre- and post-transplantation biomarkers of acute and chronic allograft damage or adaptation, focusing on peripheral blood-based methodologies for non-invasive application. We then critically discuss current findings with respect to their future application in routine clinical transplantation medicine. Complement-system-associated SNPs present potential biomarkers that may be used to indicate the baseline risk for allograft damage prior to transplantation. The detection of antibodies against novel, non-HLA, MICA antigens, and the expression of cytokine genes and proteins and cytotoxicity-related genes have been correlated with allograft damage and are potential post-transplantation biomarkers indicating allograft damage at the molecular level, although these do not have clinical relevance yet. Several multi-gene expression-based biomarker panels have been identified that accurately predicted graft accommodation in liver transplant recipients and may be developed into a predictive biomarker assay.
View details for DOI 10.1186/gm253
View details for PubMedID 21658299
Combining the results of studies using highly parallelized measurements of gene expression such as microarrays and RNAseq offer unique challenges in meta analysis. Motivated by a need for a deeper understanding of organ transplant rejection, we combine the data from five separate studies to compare acute rejection versus stability after solid organ transplantation, and use this data to examine approaches to multiplex meta analysis.We demonstrate that a commonly used parametric effect size estimate approach and a commonly used non-parametric method give very different results in prioritizing genes. The parametric method providing a meta effect estimate was superior at ranking genes based on our gold-standard of identifying immune response genes in the transplant rejection datasets.Different methods of multiplex analysis can give substantially different results. The method which is best for any given application will likely depend on the particular domain, and it remains for future work to see if any one method is consistently better at identifying important biological signal across gene expression experiments.
View details for DOI 10.1186/1471-2105-11-S9-S6
View details for Web of Science ID 000290218700006
View details for PubMedID 21044364
The gene expression changes produced by moderate hypothermia are not fully known, but appear to differ in important ways from those produced by heat shock. We examined the gene expression changes produced by moderate hypothermia and tested the hypothesis that rewarming after hypothermia approximates a heat-shock response. Six sets of human HepG2 hepatocytes were subjected to moderate hypothermia (31 degrees C for 16 h), a conventional in vitro heat shock (43 degrees C for 30 min) or control conditions (37 degrees C), then harvested immediately or allowed to recover for 3 h at 37 degrees C. Expression analysis was performed with Affymetrix U133A gene chips, using analysis of variance-based techniques. Moderate hypothermia led to distinct time-dependent expression changes, as did heat shock. Hypothermia initially caused statistically significant, greater than or equal to twofold changes in expression (relative to controls) of 409 sequences (143 increased and 266 decreased), whereas heat shock affected 71 (35 increased and 36 decreased). After 3 h of recovery, 192 sequences (83 increased, 109 decreased) were affected by hypothermia and 231 (146 increased, 85 decreased) by heat shock. Expression of many heat shock proteins was decreased by hypothermia but significantly increased after rewarming. A comparison of sequences affected by thermal stress without regard to the magnitude of change revealed that the overlap between heat and cold stress was greater after 3 h of recovery than immediately following thermal stress. Thus, while some overlap occurs (particularly after rewarming), moderate hypothermia produces extensive, time-dependent gene expression changes in HepG2 cells that differ in important ways from those induced by heat shock.
View details for DOI 10.1007/s12192-010-0181-2
View details for Web of Science ID 000280781800021
View details for PubMedID 20526826
We describe cell type-specific significance analysis of microarrays (csSAM) for analyzing differential gene expression for each cell type in a biological sample from microarray data and relative cell-type frequencies. First, we validated csSAM with predesigned mixtures and then applied it to whole-blood gene expression datasets from stable post-transplant kidney transplant recipients and those experiencing acute transplant rejection, which revealed hundreds of differentially expressed genes that were otherwise undetectable.
View details for DOI 10.1038/NMETH.1439
View details for Web of Science ID 000276150600017
View details for PubMedID 20208531
We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches.
View details for DOI 10.1186/gb-2010-11-1-r3
View details for Web of Science ID 000276433600011
View details for PubMedID 20067622
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.
View details for DOI 10.1109/TCBB.2008.29
View details for Web of Science ID 000274063600008
View details for PubMedID 20150671
In the last decade, microarray technology has revolutionized biological research by allowing the screening of tens of thousands of genes simultaneously. This article reviews recent studies in organ transplantation using microarrays and highlights the issues that should be addressed in order to use microarrays in diagnosis of rejection.Microarrays have been useful in identifying potential biomarkers for chronic rejection in peripheral blood mononuclear cells, novel pathways for induction of tolerance, and genes involved in protecting the graft from the host immune system. Microarray analysis of peripheral blood mononuclear cells from chronic antibody-mediated rejection has identified potential noninvasive biomarkers. In a recent study, correlation of pathogenesis-based transcripts with histopathologic lesions is a promising step towards inclusion of microarrays in clinics for organ transplants.Despite promising results in diagnosis of histopathologic lesions using microarrays, the low dynamic range of microarrays and large measured expression changes within the probes for the same gene continue to cast doubts on their readiness for diagnosis of rejection. More studies must be performed to resolve these issues. Dominating expression of globin genes in whole blood poses another challenge for identification of noninvasive biomarkers. In addition, studies are also needed to demonstrate effects of different immunosuppression therapies and their outcomes.
View details for DOI 10.1097/MOT.0b013e32831e13d0
View details for Web of Science ID 000264312900007
View details for PubMedID 19337144
Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe.We describe a novel signaling pathway impact analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance P-value, which combines the enrichment and perturbation P-values. We illustrate the capabilities of the novel method on four real datasets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods.SPIA was implemented as an R package available at http://vortex.cs.wayne.edu/ontoexpress/
View details for DOI 10.1093/bioinformatics/btn577
View details for Web of Science ID 000261996400012
View details for PubMedID 18990722
A common challenge in the analysis of genomics data is trying to understand the underlying phenomenon in the context of all complex interactions taking place on various signaling pathways. A statistical approach using various models is universally used to identify the most relevant pathways in a given experiment. Here, we show that the existing pathway analysis methods fail to take into consideration important biological aspects and may provide incorrect results in certain situations. By using a systems biology approach, we developed an impact analysis that includes the classical statistics but also considers other crucial factors such as the magnitude of each gene's expression change, their type and position in the given pathways, their interactions, etc. The impact analysis is an attempt to a deeper level of statistical analysis, informed by more pathway-specific biology than the existing techniques. On several illustrative data sets, the classical analysis produces both false positives and false negatives, while the impact analysis provides biologically meaningful results. This analysis method has been implemented as a Web-based tool, Pathway-Express, freely available as part of the Onto-Tools (http://vortex.cs.wayne.edu).
View details for DOI 10.1101/gr.6202607
View details for Web of Science ID 000249869200015
View details for PubMedID 17785539
Onto-Tools is a freely available web-accessible software suite, composed of an annotation database and nine complementary data-mining tools. This article describes a new tool, Onto-Express-to-go (OE2GO), as well as some new features implemented in Pathway-Express and Onto-Miner over the past year. Pathway-Express (PE) has been enhanced to identify significantly perturbed pathways in a given condition using the differentially expressed genes in the input. OE2GO is a tool for functional profiling using custom annotations. The development of this tool was aimed at the researchers working with organisms for which annotations are not yet available in the public domain. OE2GO allows researchers to use either annotation data from the Onto-Tools database, or their own custom annotations. By removing the necessity to use any specific database, OE2GO makes the functional profiling available for all organisms, with annotations using any ontology. The Onto-Tools are freely available at http://vortex.cs.wayne.edu/projects.htm.
View details for DOI 10.1093/nar/gkm327
View details for Web of Science ID 000255311500039
View details for PubMedID 17584796
Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment.We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96 and 99% of the biological entities provided as input. In terms of speed, sets of approximately 20 000 IDs can be translated in <30 s, in most cases.OT is a part of Onto-Tools, which is freely available at http://vortex.cs.wayne.edu/Projects.html
View details for DOI 10.1093/bioinofrmatics/btl372
View details for Web of Science ID 000242246300015
View details for PubMedID 17068090
The Onto-Tools suite is composed of an annotation database and eight complementary, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express and nsSNPCounter. Promoter-Express is a new tool added to the Onto-Tools ensemble that facilitates the identification of transcription factor binding sites active in specific conditions. nsSNPCounter is another new tool that allows computation and analysis of synonymous and non-synonymous codon substitutions for studying evolutionary rates of protein coding genes. Onto-Translate has also been enhanced to expand its scope and accuracy by fully utilizing the capabilities of the Onto-Tools database. Currently, Onto-Translate allows arbitrary mappings between 28 types of IDs for 53 organisms. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkl213
View details for Web of Science ID 000245650200126
View details for PubMedID 16845086
DNA microarrays enable researchers to monitor the expression of thousands of genes simultaneously. However, the current technology has several limitations. Here we discuss problems related to the sensitivity, accuracy, specificity and reproducibility of microarray results. The existing data suggest that for relatively abundant transcripts the existence and direction (but not the magnitude) of expression changes can be reliably detected. However, accurate measurements of absolute expression levels and the reliable detection of low abundance genes are difficult to achieve. The main problems seem to be the sub-optimal design or choice of probes and some incorrect probe annotations. Well-designed data-analysis approaches can rectify some of these problems.
View details for DOI 10.1016/j.tig.2005.12.005
View details for Web of Science ID 000235576900009
View details for PubMedID 16380191
Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of differentially expressed genes. An automatic ontological analysis approach has been recently proposed to help with the biological interpretation of such results. Currently, this approach is the de facto standard for the secondary analysis of high throughput experiments and a large number of tools have been developed for this purpose. We present a detailed comparison of 14 such tools using the following criteria: scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data. This detailed analysis of the capabilities of these tools will help researchers choose the most appropriate tool for a given type of analysis. More importantly, in spite of the fact that this type of analysis has been generally adopted, this approach has several important intrinsic drawbacks. These drawbacks are associated with all tools discussed and represent conceptual limitations of the current state-of-the-art in ontological analysis. We propose these as challenges for the next generation of secondary data analysis tools.
View details for DOI 10.1093/bioinformatics/bti565
View details for Web of Science ID 000231694600001
View details for PubMedID 15994189
The correct interpretation of any biological experiment depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are ubiquitous and used by all life scientists in most experiments. However, it is well known that such databases are incomplete and many annotations may also be incorrect. In this paper we describe a technique that can be used to analyze the semantic content of such annotation databases. Our approach is able to extract implicit semantic relationships between genes and functions. This ability allows us to discover novel functions for known genes. This approach is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy. We used our technique to analyze the current annotations of the human genome. From this body of annotations, we were able to predict 212 additional gene-function assignments. A subsequent literature search found that 138 of these gene-functions assignments are supported by existing peer-reviewed papers. An additional 23 assignments have been confirmed in the meantime by the addition of the respective annotations in later releases of the Gene Ontology database. Overall, the 161 confirmed assignments represent 75.95% of the proposed gene-function assignments. Only one of our predictions (0.4%) was contradicted by the existing literature. We could not find any relevant articles for 50 of our predictions (23.58%). The method is independent of the organism and can be used to analyze and improve the quality of the data of any public or private annotation database.
View details for DOI 10.1093/bioinformatics/bti538
View details for Web of Science ID 000231360600012
View details for PubMedID 15955782
The Onto-Tools suite is composed of an annotation database and six seamlessly integrated, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner and Pathway-Express. The Onto-Tools database has been expanded to include various types of data from 12 new databases. Our database now integrates different types of genomic data from 19 sequence, gene, protein and annotation databases. Additionally, our database is also expanded to include complete Gene Ontology (GO) annotations. Using the enhanced database and GO annotations, Onto-Express now allows functional profiling for 24 organisms and supports 17 different types of input IDs. Onto-Translate is also enhanced to fully utilize the capabilities of the new Onto-Tools database with an ultimate goal of providing the users with a non-redundant and complete mapping from any type of identification system to any other type. Currently, Onto-Translate allows arbitrary mappings between 29 types of IDs. Pathway-Express is a new tool that helps the users find the most interesting pathways for their input list of genes. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gki472
View details for Web of Science ID 000230271400156
View details for PubMedID 15980579
Sequences that are present in a given species or strain while absent from or different in any other organisms can be used to distinguish the target organism from other related or un-related species. Such DNA signatures are particularly important for the identification of genetic source of drug resistance of a strain or for the detection of organisms that can be used as biological agents in warfare or terrorism. Most approaches used to find DNA signatures are laboratory based, require a great deal of effort and can only distinguish between two organisms at a time. We propose a more efficient and cost-effective bioinformatics approach that allows identification of genomic fingerprints for a target organism. We validated our approach using a custom microarray, using sequences identified as DNA fingerprints of Bacillus anthracis. Hybridization results showed that the sequences found using our algorithm were truly unique to B. anthracis and were able to distinguish B. anthracis from its close relatives B. cereus and B. thuringiensis.
View details for Web of Science ID 000230169100021
View details for PubMedID 15759631
The Onto-Tools suite is composed of an annotation database and five seamlessly integrated web-accessible data mining tools: Onto-Express (OE), Onto-Compare (OC), Onto-Design (OD), Onto-Translate (OT) and Onto-Miner (OM). OM is a new tool that provides a unified access point and an application programming interface for most annotations available. Our database has been enhanced with more than 120 new commercial microarrays and annotations for Rattus norvegicus, Drosophila melanogaster and Carnorhabditis elegans. The Onto-Tools have been redesigned to provide better biological insight, improved performance and user convenience. The new features implemented in OE include support for gene names, LocusLink IDs and Gene Ontology (GO) IDs, ability to specify fold changes for the input genes, links to the KEGG pathway database and detailed output files. OC allows comparisons of the functional bias of more than 170 commercial microarrays. The latest version of OD allows the user to specify keywords if the exact GO term is not known as well as providing more details than the previous version. OE, OC and OD now have an integrated GO browser that allows the user to customize the level of abstraction for each GO category. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkh409
View details for Web of Science ID 000222273100090
View details for PubMedID 15215428
Onto-Tools is a set of four seamlessly integrated databases: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Onto-Express is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the condition studied upon various biological processes and pathways. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. Once the initial exploratory analysis identified a number of relevant biological processes, specific mechanisms of interactions can be hypothesized for the conditions studied. Currently, many commercial arrays are available for the investigation of specific mechanisms. Each such array is characterized by a biological bias determined by the extent to which the genes present on the array represent specific pathways. Onto-Compare is a tool that allows efficient comparisons of any sets of commercial or custom arrays. Using Onto-Compare, a researcher can determine quickly which array, or set of arrays, covers best the hypotheses studied. In many situations, no commercial arrays are available for specific biological mechanisms. Onto-Design is a tool that allows the user to select genes that represent given functional categories. Onto-Translate allows the user to translate easily lists of accession numbers, UniGene clusters and Affymetrix probes into one another. All tools above are seamlessly integrated. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkg624
View details for Web of Science ID 000183832900108
View details for PubMedID 12824416
Microarrays are at the center of a revolution in biotechnology, allowing researchers to screen tens of thousands of genes simultaneously. Typically, they have been used in exploratory research to help formulate hypotheses. In most cases, this phase is followed by a more focused, hypothesis-driven stage in which certain specific biological processes and pathways are thought to be involved. Since a single biological process can still involve hundreds of genes, microarrays are still the preferred approach as proven by the availability of focused arrays from several manufacturers. Because focused arrays from different manufacturers use different sets of genes, each array will represent any given regulatory pathway to a different extent. We argue that a functional analysis of the arrays available should be the most important criterion used in the array selection. We developed Onto-Compare as a database that can provide this functionality, based on the Gene Ontology Consortium nomenclature. We used this tool to compare several arrays focused on apoptosis, oncogenes, and tumor suppressors. We considered arrays from BD Biosciences Clontech, PerkinElmer, Sigma-Genosys, and SuperArray. We showed that among the oncogene arrays, the PerkinElmer MICROMAX oncogene microarray has a better representation of oncogenesis, protein phosphorylation, and negative control of cell proliferation. The comparison of the apoptosis arrays showed that most apoptosis-related biological processes are equally well represented on the arrays considered. However, functional categories such as immune response, cell-cell signaling, cell-surface receptor linked signal transduction, and interleukins are better represented on the Sigma-Genoys Panorama human apoptosis array. At the same time, processes such as cell cycle control, oncogenesis, and negative control of cell proliferation are better represented on the BD Biosciences Clontech Atlas Select human apoptosis array.
View details for Web of Science ID 000181595900009
View details for PubMedID 12664686
The typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independent of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function, and chromosome location. Statistical significance values are calculated for each category. We demonstrate the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer datasets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms.
View details for DOI 10.1016/S0888-7543(02)00021-6
View details for Web of Science ID 000181532700002
View details for PubMedID 12620386
Findings from several studies support the conclusion that spermatozoa contain a complex repertoire of mRNAs. Even though these mRNAs are thought to provide an insight into past events of spermatogenesis, their complexity and function have yet to be established. Our aim was to determine whether we could use spermatozoal mRNAs to generate a genetic fingerprint of normal fertile men.We used a suite of microarrays containing 27016 unique expressed sequence tags (ESTs) to investigate cDNAs from a pool of 19 testes, cDNAs from a pool of nine individual ejaculate spermatozoal mRNAs, and cDNAs constructed from spermatozoal mRNAs from a single ejaculate. We also used ontological data mining to determine the function of the genes identified in each EST profile.The cDNAs from the testes, pooled ejaculate, and single ejaculate hybridised to 7157, 3281, and 2780 ESTs, respectively. The testicular population contained all of the ESTs identified by the cDNAs from the pooled and individual ejaculate. The pooled ejaculate population contained all but four ESTs identified from the individual ejaculate. A subset of the spermatozoal mRNAs was associated with embryo development.The microarray data from testes and spermatozoa (pooled and individual) were concordant, supporting the view that a spermatozoal mRNA fingerprint can be obtained from normal fertile men. Thus, profiling can be used to monitor past events-ie, gene expression of spermatogenesis. Moreover, the data suggest that, in addition to delivering the paternal genome, spermatozoa provide the zygote with a unique suite of paternal mRNAs. Ejaculate spermatozoa can now be used as a non-invasive proxy for investigations of testis-specific infertility.
View details for Web of Science ID 000177933000019
View details for PubMedID 12241836
Gene expression profiles obtained through microarray or data mining analyses often exist as vast data strings. To interpret the biology of these genetic profiles, investigators must analyze this data in the context of other information such as the biological, biochemical, or molecular function of the translated proteins. This is particularly challenging for a human analyst because large quantities of less than relevant data often bury such information. To address this need we implemented an automated routine, called Onto-Express (http://vortex.cs.wayne.edu:8080), to systematically translate genetic fingerprints into functional profiles. Using strings of accession or cluster identification numbers, Onto-Express searches the public databases and returns tables that correlate expression profiles with the cytogenetic locations, biochemical and molecular functions, biological processes, cellular components, and cellular roles of the translated proteins. The profiles created by Onto-Express fundamentally increase the value of gene expression analyses by facilitating the translation of quantitative value sets to records that contain biological implications.
View details for DOI 10.1006/geno.2002.6698
View details for Web of Science ID 000173628100016
View details for PubMedID 11829497