CCAST: A Model-Based Gating Strategy to Isolate Homogeneous Subpopulations in a Heterogeneous Population of Single Cells
PLOS COMPUTATIONAL BIOLOGY
2014; 10 (7)
Exact likelihood computation in Boolean networks with probabilistic time delays, and its application in signal network reconstruction
2014; 30 (3): 414-419
A model-based gating strategy is developed for sorting cells and analyzing populations of single cells. The strategy, named CCAST, for Clustering, Classification and Sorting Tree, identifies a gating strategy for isolating homogeneous subpopulations from a heterogeneous population of single cells using a data-derived decision tree representation that can be applied to cell sorting. Because CCAST does not rely on expert knowledge, it removes human bias and variability when determining the gating strategy. It combines any clustering algorithm with silhouette measures to identify underlying homogeneous subpopulations, then applies recursive partitioning techniques to generate a decision tree that defines the gating strategy. CCAST produces an optimal strategy for cell sorting by automating the selection of gating markers, the corresponding gating thresholds and gating sequence; all of these parameters are typically manually defined. Even though CCAST is optimized for cell sorting, it can be applied for the identification and analysis of homogeneous subpopulations among heterogeneous single cell data. We apply CCAST on single cell data from both breast cancer cell lines and normal human bone marrow. On the SUM159 breast cancer cell line data, CCAST indicates at least five distinct cell states based on two surface markers (CD24 and EPCAM) and provides a gating sorting strategy that produces more homogeneous subpopulations than previously reported. When applied to normal bone marrow data, CCAST reveals an efficient strategy for gating T-cells without prior knowledge of the major T-cell subtypes and the markers that best define them. On the normal bone marrow data, CCAST also reveals two major mature B-cell subtypes, namely CD123+ and CD123- cells, which were not revealed by manual gating but show distinct intracellular signaling responses. More generally, the CCAST framework could be used on other biological and non-biological high dimensional data types that are mixtures of unknown homogeneous subpopulations.
View details for DOI 10.1371/journal.pcbi.1003664
View details for Web of Science ID 000339890900004
View details for PubMedID 25078380
Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested effects models
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (16): 6447-6452
For biological pathways, it is common to measure a gene expression time series after various knockdowns of genes that are putatively involved in the process of interest. These interventional time-resolved data are most suitable for the elucidation of dynamic causal relationships in signaling networks. Even with this kind of data it is still a major and largely unsolved challenge to infer the topology and interaction logic of the underlying regulatory network.In this work, we present a novel model-based approach involving Boolean networks to reconstruct small to medium-sized regulatory networks. In particular, we solve the problem of exact likelihood computation in Boolean networks with probabilistic exponential time delays. Simulations demonstrate the high accuracy of our approach. We apply our method to data of Ivanova et al. (2006), where RNA interference knockdown experiments were used to build a network of the key regulatory genes governing mouse stem cell maintenance and differentiation. In contrast to previous analyses of that data set, our method can identify feedback loops and provides new insights into the interplay of some master regulators in embryonic stem cell development.The algorithm is implemented in the statistical language R. Code and documentation are available at Bioinformatics email@example.com or firstname.lastname@example.orgSupplementary Materials are available at Bioinfomatics online.
View details for DOI 10.1093/bioinformatics/btt696
View details for Web of Science ID 000331271100016
View details for PubMedID 24292937
Cellular decision making in differentiation, proliferation, or cell death is mediated by molecular signaling processes, which control the regulation and expression of genes. Vice versa, the expression of genes can trigger the activity of signaling pathways. We introduce and describe a statistical method called Dynamic Nested Effects Model (D-NEM) for analyzing the temporal interplay of cell signaling and gene expression. D-NEMs are Bayesian models of signal propagation in a network. They decompose observed time delays of multiple step signaling processes into single steps. Time delays are assumed to be exponentially distributed. Rate constants of signal propagation are model parameters, whose joint posterior distribution is assessed via Gibbs sampling. They hold information on the interplay of different forms of biological signal propagation. Molecular signaling in the cytoplasm acts at high rates, direct signal propagation via transcription and translation act at intermediate rates, while secondary effects operate at low rates. D-NEMs allow the dissection of biological processes into signaling and expression events, and analysis of cellular signal flow. An application of D-NEMs to embryonic stem cell development in mice reveals a feed-forward loop dominated network, which stabilizes the differentiated state of cells and points to Nanog as the key sensitizer of stem cells for differentiation stimuli.
View details for DOI 10.1073/pnas.0809822106
View details for Web of Science ID 000265506800007
View details for PubMedID 19329492