Featured Software

Lung Adenocarcinoma

Description: The spatial distributions of different types of cells could reveal a cancer cell growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key hallmarks of cancer. However, manually recognizing and localizing all the cells in pathology slides are almost impossible. In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor, stromal and lymphocytes classification, and extraction of tumor microenvironment related features for lung cancer pathology images.


Description: To facilitate understanding of the connections between DNA variation and phenotypic AMR, we developed VAMPr (variant mapping and prediction of antibiotic resistance). It utilized 3,393 sequenced bacterial isolates from 9 species along with AMR phenotypes for 29 antibiotics following the Clinical & Laboratory Standards Institute (CLSI) guidelines. It also detected 14,615 variant genotypes and provided 93 association and prediction models.

Histology-based Digital (HD) - Staining

Description: Histology-based Digital (HD) - Staining is a newly developed deep-learning algorithm. We trained a HD-Staining model to segment tumor nuclei, stroma nuclei, lymphocyte nuclei, macrophage nuclei, karyorrhexis, and red blood cells in pathological Hematoxylin & Eosin (HE) stained images. Around 10,000 cells are covered in our training dataset from the National Lung Screening Trial (NLST) cohort. This tool aims to dissect tumor microenvironment in cell level.


Description: Single cell profiling techniques such as single cell sequencing and cytometry are powerful for comprehensive and high-resolution characterization of the cellular heterogeneities observed in tumors, brain, and other tissues. The identification and assignment of cell types from the pool of profiled cells is the first step of data analysis involving scRNA-seq or cytometry data. To achieve this goal, we developed the SCINA algorithm, short for Semi-supervised category identification and assignment. SCINA is originally designed to assign cell types based on single cell RNA-seq data.


Description: To estimate the gene expression levels and component proportions of the normal, stroma (immune) and tumor components of bulk tumor RNA-Seq samples. Although DisHet is designed for dissection of bulk tumor samples using matched normal tissue and tumorgraft RNA-seq data, it is widely applicable to dissection of gene expression of any mixture of 3 types of cells.

Prognostic Model for Non Small Cell Lung Cancer

Description: Prognostic Model for Predicting Survival in Non Small Cell Lung Cancer Patients.

Prognostic Model for Small Cell Lung Cancer

Description: Prognostic Model for Predicting Survival in Small Cell Lung Cancer Patients.


Description: Genomic Regression Analysis of Coordinated Expression (GRACE) is a method developed to remove effect of copy number alteration from co-expression analysis so that the resulting genes are mostly based on biological regulation. This database allows users to perform co-expression analysis with tumor or normal samples from various cancer types based on TCGA studies.

Lung Cancer Explorer

Description: The Lung Cancer Explorer is an online tool for exploring and analyzing gene expression data from dozens of accessible public lung cancer datasets.


Description: GeNeCK (Gene Network Construction Kit) is a comprehensive online tool kit that integrate various statistical methods to construct gene networks based on gene expression data and optional hub gene information.

Drug Combination

Description: We developed a novel Drug-Induced Genomic Residual Effect (DIGRE) computational model to predict drug combination effects by explicitly modeling the drug response dynamics and gene expression changes after individual drug treatments. The DIGRE model won Best Performance in the National Cancer Institute’s DREAM 7 Drug Combination Synergy Prediction Challenge, an international crowdsourcing-based computational challenge for predicting drug combination effects using transcriptome data.

Software Packages

HITS-CLIP Analysis

Description: We developed a model-based approach to detect RNA-protein binding sites in HITS-CLIP. The two-stage model, which is inspired by the underlying spatial association of the read coverage and the positional bias of mutations, is established on all the sequencing reads (including non-clustered read sequences) to investigate binding sites at single base pair resolution. This toolbox provides essential MATLAB functions to implement our model for the identification of binding sites using heterogeneous logit models via semi-supervised learning.
Documents: Readme.pdf
Software Package: HITS-CLIP_analysis.zip


Description: The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. This package provides an integrative model to establish a joint distribution of read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we adopts non-homogeneous hidden Markov models that incorporate the nucleotide sequence.
Documents: Readme.pdf
Software Package: PAR-CLIP_HMM.zip


Description: dCLIP is written in Perl for discovering differential binding regions in two CLIP-Seq (HITS-CLIP or PAR-CLIP) experiments. It is appropiate in experiments where the common binding regions that are significantly enriched in both conditions tend to have similar binding strength and when researchers are more interested in the difference in binding strength rather than the binary event of whether binding site is common or not. For example, dCLIP will work when researchers would like to know the differential binding sites of AGO protein under a wild-type and miRNA knockdown condition.
Documents: Readme.pdf
Software Package: dCLIP_1.7.tar.gz

Bayesian Joint Analysis

Description: Identifying which genes are differentially expressed (DE) and which gene sets (such as functional categories, biological pathways, regulatory networks) are altered (i.e., overpopulated with up/down-regulated genes) under two experimental conditions are both key questions in microarray analysis. Although closely related and seemingly similar, they cannot replace each other, due to their own importance and contributions in scientific discoveries. Existing approaches have been developed to address only one of the two questions each time, but not both of them simultaneously. Bayesian joint modeling approach to address the two key questions in parallel, which incorporates the information of functional annotations (or other biological grouping information such as pathways or networks of genes) into expression data analysis and meanwhile infer the enrichment of functional groups (or other types of gene sets).
Software Package: BayesianJointAnalysis.zip
Reference: Wang X, Chen M, Khodursky AB and Xiao G, Bayesian Joint Analysis of Gene Expression Data and Gene Functional Annotations, Statistics in Biosciences. 2012 Nov; 4(2): 300-318


Description: High-throughput RNAi screening has been widely used in a spectrum of biomedical research and made it possible to study functional genomics. However, a challenge for authentic biological interpretation of large-scale siRNA or shRNA-mediated loss-of-function studies is the biological pleiotropy resulting from multiple modes of action of siRNA and shRNA reagents. A major confounding feature of these reagents is the microRNA-like translational quelling that can result from short regions (~6 nucleotides) of oligonucleotide complementarity to many different mRNAs. To help identify and correct miRNA-mimic off-target effects, we have developed DecoRNAi (deconvolution analysis of RNAi screening data) for automated quantitation and annotation of microRNA-like off-target effects in primary RNAi screening data sets. DecoRNAi can effectively identify and correct off-target effects from primary screening data and provide data visualization for study and publication. DecoRNAi contains pre-computed seed sequence families for 3 commonly employed commercial siRNA libraries. For custom collections, the tool will compute seed sequence membership from a user-supplied reagent sequence table. All parameters are tunable and output files include global data visualization, the identified seed family associations, the siRNA pools containing off-target seed families, corrected z-scores and the potential miRNAs with phenotypes of interest.
Documents: Manual.pdf
Software Package: DecoRNAi_1.0.tar.gz


Description: Connects to QBRC’s EntrezToProbe engine system to handle mappings between probes and genes and provide access to information about probes and genes.
Documents & Demo: http://qbrc.swmed.edu/software/probemapper/
R Package: http://cran.r-project.org/web/packages/probemapper/index.html
References: Allen JD, Wang S, Chen M, Girard L, Minna J, Xie Y, Xiao G*. Probe mapping across multiple microarray platforms, Briefings in Bioinformatics, 2012 Sep;13(5):547-54. doi: 10.1093/bib/bbr076. PMID: 22199380


Description: Genome-wide RNAi screening experiments are customarily carried out on hundreds of 96-well or 384-well plates in order to study gene functions and discover novel drug targets. Spatial background noises however often blur interpretation of experimental results by distorting the distinct spatial patterns between different plates. It is therefore important to identify and correct the spatial background noises when analyzing RNAi screening data. Here, we developed an algorithm SbacHTS (Spatial background correction for High-Throughput RNAi Screening), for visualization, estimation and correction of spatial background noises of RNAi screening experiment results. SbacHTS can effectively detect and correct spatial background noise leading to higher signal/noise ratio and improved hits discovery for RNAi screening experiments. The only input required by the algorithm is the raw reads from the replicate plates.
Documents: Manual.pdf
Software Package: SbacHTS_V5.zip


Description: This package provides a model-based background correction method, which incorporates the negative control beads to pre-process Illumina BeadArray data.
Software Package: http://www.bioconductor.org/packages//2.10/bioc/html/MBCB.html
References: Xie Y, Wang X, Story M. Statistical Methods of Background Correction for Illumina BeadArray. Bioinformatics, 2009, Mar 15;25(6):751-7. doi: 10.1093/bioinformatics/btp040. PMID: 19193732

Allen JD, Chen M, Xie Y (2009) Model-Based Background Correction (MBCB): R methods and GUI for Illumina Bead-array Data. J Canc Sci Ther 1: 025-027. doi:10.4172/1948-5956.1000004

Ensemble Network Aggregation (ENA)

Description: Ensemble network aggregation is an approach which leverages the inverse-rank-product (IRP) method to combine networks. This package provides the capabilities to use IRP to bootstrap a dataset using a single method, to aggregate the networks produced by multiple methods, or to aggregate the networks produced on different datasets. Additionally, it offers convenience functions for converting between adjacency lists and matrices, and computing discrete graphs based on the Rank-Product method.
Source: https://github.com/QBRC/ena
Download: http://cran.r-project.org/web/packages/ENA/index.html