Doctor of Philosophy, Pennsylvania State University (2009)
Michael Snyder, Postdoctoral Faculty Sponsor
I spent most of my PH.D time on identifying and elucidating the functions of cis-regulatory modules (CRMs) through functional genomic, comparative genomics approaches.
My current projects include:
1) Annotate the human functional sequences using human ENCODE and Mouse ENCODE data.
2) Improve the understanding of personal genomics by combing functional genomics, comparative genomics and population genomics approaches.
3) Relationship between immune system and micro biome.
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
View details for DOI 10.1038/nature13668
View details for PubMedID 25164757
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
View details for DOI 10.1038/nature11245
View details for Web of Science ID 000308347000042
View details for PubMedID 22955619
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
View details for DOI 10.1038/nature11247
View details for Web of Science ID 000308347000039
View details for PubMedID 22955616
As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.
View details for DOI 10.1101/gr.137323.112
View details for Web of Science ID 000308272800019
View details for PubMedID 22955989
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
View details for DOI 10.1101/gr.139105.112
View details for Web of Science ID 000308272800020
View details for PubMedID 22955990
Tissue-specific transcription patterns are preserved throughout cell divisions to maintain lineage fidelity. We investigated whether transcription factor GATA1 plays a role in transmitting hematopoietic gene expression programs through mitosis when transcription is transiently silenced. Live-cell imaging revealed that a fraction of GATA1 is retained focally within mitotic chromatin. ChIP-seq of highly purified mitotic cells uncovered that key hematopoietic regulatory genes are occupied by GATA1 in mitosis. The GATA1 coregulators FOG1 and TAL1 dissociate from mitotic chromatin, suggesting that GATA1 functions as platform for their postmitotic recruitment. Mitotic GATA1 target genes tend to reactivate more rapidly upon entry into G1 than genes from which GATA1 dissociates. Mitosis-specific destruction of GATA1 delays reactivation selectively of genes that retain GATA1 during mitosis. These studies suggest a requirement of mitotic "bookmarking" by GATA1 for the faithful propagation of cell-type-specific transcription programs through cell division.
View details for DOI 10.1016/j.cell.2012.06.038
View details for Web of Science ID 000308002300008
View details for PubMedID 22901805
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for Web of Science ID 000301889500023
View details for PubMedID 22424236
ABSTRACT: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
View details for PubMedID 22889292
View details for Web of Science ID 000299597100548
Interplays among lineage-specific nuclear proteins, chromatin modifying enzymes, and the basal transcription machinery govern cellular differentiation, but their dynamics of action and coordination with transcriptional control are not fully understood. Alterations in chromatin structure appear to establish a permissive state for gene activation at some loci, but they play an integral role in activation at other loci. To determine the predominant roles of chromatin states and factor occupancy in directing gene regulation during differentiation, we mapped chromatin accessibility, histone modifications, and nuclear factor occupancy genome-wide during mouse erythroid differentiation dependent on the master regulatory transcription factor GATA1. Notably, despite extensive changes in gene expression, the chromatin state profiles (proportions of a gene in a chromatin state dominated by activating or repressive histone modifications) and accessibility remain largely unchanged during GATA1-induced erythroid differentiation. In contrast, gene induction and repression are strongly associated with changes in patterns of transcription factor occupancy. Our results indicate that during erythroid differentiation, the broad features of chromatin states are established at the stage of lineage commitment, largely independently of GATA1. These determine permissiveness for expression, with subsequent induction or repression mediated by distinctive combinations of transcription factors.
View details for DOI 10.1101/gr.125088.111
View details for Web of Science ID 000295407800010
View details for PubMedID 21795386
Acetylation of histones triggers association with bromodomain-containing proteins that regulate diverse chromatin-related processes. Although acetylation of transcription factors has been appreciated for some time, the mechanistic consequences are less well understood. The hematopoietic transcription factor GATA1 is acetylated at conserved lysines that are required for its stable association with chromatin. We show that the BET family protein Brd3 binds via its first bromodomain (BD1) to GATA1 in an acetylation-dependent manner in vitro and in vivo. Mutation of a single residue in BD1 that is involved in acetyl-lysine binding abrogated recruitment of Brd3 by GATA1, demonstrating that acetylation of GATA1 is essential for Brd3 association with chromatin. Notably, Brd3 is recruited by GATA1 to both active and repressed target genes in a fashion seemingly independent of histone acetylation. Anti-Brd3 ChIP followed by massively parallel sequencing in GATA1-deficient erythroid precursor cells and those that are GATA1 replete revealed that GATA1 is a major determinant of Brd3 recruitment to genomic targets within chromatin. A pharmacologic compound that occupies the acetyl-lysine binding pockets of Brd3 bromodomains disrupts the Brd3-GATA1 interaction, diminishes the chromatin occupancy of both proteins, and inhibits erythroid maturation. Together these findings provide a mechanism for GATA1 acetylation and suggest that Brd3 "reads" acetyl marks on nuclear factors to promote their stable association with chromatin.
View details for DOI 10.1073/pnas.1102140108
View details for Web of Science ID 000291106200006
View details for PubMedID 21536911
The transcription factor GATA1 regulates an extensive program of gene activation and repression during erythroid development. However, the associated mechanisms, including the contributions of distal versus proximal cis-regulatory modules, co-occupancy with other transcription factors, and the effects of histone modifications, are poorly understood. We studied these problems genome-wide in a Gata1 knockout erythroblast cell line that undergoes GATA1-dependent terminal maturation, identifying 2616 GATA1-responsive genes and 15,360 GATA1-occupied DNA segments after restoration of GATA1. Virtually all occupied DNA segments have high levels of H3K4 monomethylation and low levels of H3K27me3 around the canonical GATA binding motif, regardless of whether the nearby gene is induced or repressed. Induced genes tend to be bound by GATA1 close to the transcription start site (most frequently in the first intron), have multiple GATA1-occupied segments that are also bound by TAL1, and show evolutionary constraint on the GATA1-binding site motif. In contrast, repressed genes are further away from GATA1-occupied segments, and a subset shows reduced TAL1 occupancy and increased H3K27me3 at the transcription start site. Our data expand the repertoire of GATA1 action in erythropoiesis by defining a new cohort of target genes and determining the spatial distribution of cis-regulatory modules throughout the genome. In addition, we begin to establish functional criteria and mechanisms that distinguish GATA1 activation from repression at specific target genes. More broadly, these studies illustrate how a "master regulator" transcription factor coordinates tissue differentiation through a panoply of DNA and protein interactions.
View details for DOI 10.1101/gr.098921.109
View details for Web of Science ID 000272273400002
View details for PubMedID 19887574
The transcription factor GATA-1 is required for terminal erythroid maturation and functions as an activator or repressor depending on gene context. Yet its in vivo site selectivity and ability to distinguish between activated versus repressed genes remain incompletely understood. In this study, we performed GATA-1 ChIP-seq in erythroid cells and compared it to GATA-1-induced gene expression changes. Bound and differentially expressed genes contain a greater number of GATA-binding motifs, a higher frequency of palindromic GATA sites, and closer occupancy to the transcriptional start site versus nondifferentially expressed genes. Moreover, we show that the transcription factor Zbtb7a occupies GATA-1-bound regions of some direct GATA-1 target genes, that the presence of SCL/TAL1 helps distinguish transcriptional activation versus repression, and that polycomb repressive complex 2 (PRC2) is involved in epigenetic silencing of a subset of GATA-1-repressed genes. These data provide insights into GATA-1-mediated gene regulation in vivo.
View details for DOI 10.1016/j.molcel.2009.11.002
View details for Web of Science ID 000272534800015
View details for PubMedID 19941827
DNA sequence motifs and epigenetic modifications contribute to specific binding by a transcription factor, but the extent to which each feature determines occupancy in vivo is poorly understood. We addressed this question in erythroid cells by identifying DNA segments occupied by GATA1 and measuring the level of trimethylation of histone H3 lysine 27 (H3K27me3) and monomethylation of H3 lysine 4 (H3K4me1) along a 66 Mb region of mouse chromosome 7. While 91% of the GATA1-occupied segments contain the consensus binding-site motif WGATAR, only approximately 0.7% of DNA segments with such a motif are occupied. Using a discriminative motif enumeration method, we identified additional motifs predictive of occupancy given the presence of WGATAR. The specific motif variant AGATAA and occurrence of multiple WGATAR motifs are both strong discriminators. Combining motifs to pair a WGATAR motif with a binding site motif for GATA1, EKLF or SP1 improves discriminative power. Epigenetic modifications are also strong determinants, with the factor-bound segments highly enriched for H3K4me1 and depleted of H3K27me3. Combining primary sequence and epigenetic determinants captures 52% of the GATA1-occupied DNA segments and substantially increases the specificity, to one out of seven segments with the required motif combination and epigenetic signals being bound.
View details for DOI 10.1093/nar/gkp747
View details for Web of Science ID 000272688400011
View details for PubMedID 19767611
GATA-1 controls hematopoietic development by activating and repressing gene transcription, yet the in vivo mechanisms that specify these opposite activities are unknown. By examining the composition of GATA-1-associated protein complexes in a conditional erythroid rescue system as well as through the use of tiling arrays we detected the SCL/TAL1, LMO2, Ldb1, E2A complex at all positively acting GATA-1-bound elements examined. Similarly, the SCL complex is present at all activating GATA elements in megakaryocytes and mast cells. In striking contrast, at sites where GATA-1 functions as a repressor, the SCL complex is depleted. A DNA-binding defective form of SCL maintains association with a subset of active GATA elements indicating that GATA-1 is a key determinant for SCL recruitment. Knockdown of LMO2 selectively impairs activation but not repression by GATA-1. ETO-2, an SCL-associated protein with the potential for transcription repression, is also absent from GATA-1-repressed genes but, unlike SCL, fails to accumulate at GATA-1-activated genes. Together, these studies identify the SCL complex as a critical and consistent determinant of positive GATA-1 activity in multiple GATA-1-regulated hematopoietic cell lineages.
View details for DOI 10.1182/blood-2008-07-169417
View details for Web of Science ID 000263918400011
View details for PubMedID 19011221
Tissue development and function are exquisitely dependent on proper regulation of gene expression, but it remains controversial whether the genomic signals controlling this process are subject to strong selective constraint. While some studies show that highly constrained noncoding regions act to enhance transcription, other studies show that DNA segments with biochemical signatures of regulatory regions, such as occupancy by a transcription factor, are seemingly unconstrained across mammalian evolution. To test the possible correlation of selective constraint with enhancer activity, we used chromatin immunoprecipitation as an approach unbiased by either evolutionary constraint or prior knowledge of regulatory activity to identify DNA segments within a 66-Mb region of mouse chromosome 7 that are occupied by the erythroid transcription factor GATA1. DNA segments bound by GATA1 were identified by hybridization to high-density tiling arrays, validated by quantitative PCR, and tested for gene regulatory activity in erythroid cells. Whereas almost all of the occupied segments contain canonical WGATAR binding site motifs for GATA1, in only 45% of the cases is the motif deeply preserved (found at the orthologous position in placental mammals or more distant species). However, GATA1-bound segments with high enhancer activity tend to be the ones with an evolutionarily preserved WGATAR motif, and this relationship was confirmed by a loss-of-function assay. Thus, GATA1 binding sites that regulate gene expression during erythroid maturation are under strong selective constraint, while nonconstrained binding may have only a limited or indirect role in regulation.
View details for DOI 10.1101/gr.083089.108
View details for Web of Science ID 000261398900005
View details for PubMedID 18818370
Identification of functional genomic regions using interspecies comparison will be most effective when the full span of relationships between genomic function and evolutionary constraint are utilized. We find that sets of putative transcriptional regulatory sequences, defined by ENCODE experimental data, have a wide span of evolutionary histories, ranging from stringent constraint shown by deep phylogenetic comparisons to recent selection on lineage-specific elements. This diversity of evolutionary histories can be captured, at least in part, by the suite of available comparative genomics tools, especially after correction for regional differences in the neutral substitution rate. Putative transcriptional regulatory regions show alignability in different clades, and the genes associated with them are enriched for distinct functions. Some of the putative regulatory regions show evidence for recent selection, including a primate-specific, distal promoter that may play a novel role in regulation.
View details for DOI 10.1101/gr.5592107
View details for Web of Science ID 000247226900010
View details for PubMedID 17567996
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
View details for DOI 10.1101/gr.5353806
View details for Web of Science ID 000242482600005
View details for PubMedID 17038566