Cancer biomarkers

We have developed gene expression signatures to predict patient prognosis and response to chemotherapy.

We are interested in developing computational models to predict patient outcomes, which will allow clinicians to tailor treatment plans for individual patients. We have developed gene expression signatures to predict patient prognosis and response to chemotherapy. Using an innovative computational and systems biology approach, we identified a set of 12 genes that predicts which patients are most likely to benefit from post-surgery chemotherapy (Patent #UTSD2627). Working together with investigators at both UT Southwestern Medical Center and MD Anderson Cancer Center, we are developing a Clinical Laboratory Improvement Amendments (CLIA) -certifiable medical device and designing a prospective clinical study to translate our 12-gene predictive signature to clinical use. Related publications can be found below:
  • Xie Y, Minna JD. Non-small-cell lung cancer mRNA expression signature predicting response to adjuvant chemotherapy. J Clin Oncol. 2010 Oct 10;28(29):4404-7. PMID: 20823415
  • Xie, Y., Xiao, G., Coombes, K. R., Behrens, C., Solis, L. M., Raso, G., Girard, L., Erickson, H. S., Roth, J., Heymach, J. V., Moran, C., Danenberg, K., Minna, J. D., and Wistuba, II. (2011) Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients, Clinical Cancer Research 17, 5705-5714.
  • Tang, H., Xiao, G., Behrens, C., Schiller, J., Allen, J., Chow, C. W., Suraokar, M., Corvalan, A., Mao, J., White, M. A., Wistuba, II, Minna, J. D., and Xie, Y*. (2013) A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients, Clin Cancer Res 19, 1577-1586.
  • Tang, H., Sebti, S., Titone, R., Zhou, Y., Isidoro, C., Ross, T., Hibshoosh, H., Xiao, G., Packer, M., Xie, Y.*, and Levine, B. Decreased BECN1 mRNA Expression in Human Breast Cancer is Associated with Estrogen Receptor-Negative Subtypes and Poor Prognosis, EBioMedicine, 2015, 2(3), 255–263
  • Tang H, Wang S, Xiao G, Schiller J, Papadimitrakopoulou V., Minna J, Wistuba I.I., Xie Y.. Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies. Annals of Oncology. 2017 Apr 1;28(4):733-740.
Read more

Statistical method for high-dimension data and integrative analysis

Statistical methodologies for spatial modeling and integrative analysis of different molecular profiling datasets

We are actively developing new bioinformatics tools and computational algorithms for big data, such as genome-wide RNAi screening data and next-generation sequencing data. We are also developing statistical methodologies for spatial modeling and integrative analysis of different molecular profiling datasets.
  • Xie Y*, Wang X, Story M. Statistical Methods of Background Correction for Illumina BeadArray. Bioinformatics, 2009, Feb 4. PMID: 19193732 PMCID: PMC2654805
  • Xie Y*, JK, Pan W, Xiao G, Khodursky A. A Bayesian Approach to Joint Modeling of Protein-DNA Binding, Gene Expression and Sequence Data. Stat Med. 2010 Feb 20;29(4):489-503.PMID:20049751.
  • Zhong R, Kim M, White M, Xie Y, Xiao G*, Spatial Background Noise Correction for High-Throughput RNAi Screening, Bioinformatics, 2013 Sep 1;29(17):2218-20
  • Zhong R, Kim J, Kim H, Kim M, Lum L, Levine L, Xiao G, White M, Xie Y*. Computational Detection and Suppression of Sequence-specific Off-target Phenotypes from Whole Genome RNAi Screens. 2014, Nucleic Acid Research. Jul;42(13):8214-22.
Read more

Statistical learning and prediction models

My lab has extensive experience in developing predictive models in biomedical research.

Our team previously won the highly competitive 2012 NCI-DREAM Drug Sensitivity Prediction Challenge (Bansal at al Nature Biotechnology, 2014), the 2013 NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge (Eduati et al Nature Biotechnology, 2015), and co-won the 2014 DREAM-Broad Institute Gene Essentiality Challenge.
  • Xiao G, Ma S, Minna J, Xie Y*, Adaptive prediction model in prospective molecular-signature-based clinical studies, Clinical Cancer Research, 2014 Feb 1;20(3):531-9. PMID:24323903.
  • Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser L, Realubit R, Mattioli M, Alvarez M1,2, Shen Y, NCI-DREAM community, Gallahan D, Singer D, Saez-Rodriguez J, Xie Y*, Stolovitzky G*, Califano A*, Predicting activity of drug combinations through crowdsourcing, Nature Biotechnology, 2014 Dec;32(12):1213-22.
  • Eduati, F.#, Mangravite, L.#, Wang, T.#, Tang, H.#, Bare, C., Huang, R., Norman, T., Kellen, M., Menden, M., Yang, J., Zhan, X., Zhong, R., Xiao, G., Xia, M., the NIEHS-NATS-UNC DREAM Toxicogenetics Collaboration, Friend, S., Dearry, A., Simeonov, A., Tice, R., Rusyn, I., Wright, F., Stolovitzky, G., Xie, Y.*, and Saez-Rodriguez, J.* Opportunities and limitations in the prediction of population responses to toxic compounds assessed through a collaborative competition, Nature Biotechnology 2015, 33, 933–940
Read more

RNA regulation

We have developed several statistical models to analyze the CLIP-seq datasets.

My research interest in gene regulation started 10 years ago when ChIP-chip technology was invented to study genome-wide transcription factor binding sites, and I developed several statistical models to analyze such datasets. Motivated by several joint projects in post-transcriptional RNA regulation, I developed a strong interest in understanding the role of RNA-binding proteins (RBPs) in RNA regulation. To better analyze the genome-wide CLIP data, my lab has develop several bioinformatics tools and analysis algorithms for CLIP-seq data.
  • Han T, Kato M, Xie S, Wu L, Mirzaei H, Pei J, Chen M, Xie Y, Allen J, Xiao G, McKnight S. Cell-free Formation of RNA Granules: Bound RNAs Identify Features and Components of Cellular Assemblies. Cell 11 May 2012 (Vol. 149, Issue 4, pp. 768-779)
  • Kwon I, Xiang S, Kato M, Wu L, Theodoropoulos P, Wang T, Kim J, Yun J, Xie Y, McKnight SL, Poly-dipeptides encoded by the C9ORF72 repeats bind nucleoli, impede RNA biogenesis, and kill cells, Science 2014, Published online 31 July 2014 [DOI:10.1126/science.1254917]
  • Chu Y, Wang T, Dodd D, Xie Y, Janowski BA, Corey DR. Intramolecular circularization increases efficiency of RNA sequencing and enables CLIP-Seq of nuclear RNA from human cells. Nucleic Acids Res. 2015 Mar 26. pii: gkv213. [Epub ahead of print] PMID: 25813040
  • Sei E, Wang T, Hunter OV, Xie Y*, Conrad NK*. HITS-CLIP analysis uncovers a link between the Kaposi's sarcoma-associated herpesvirus ORF57 protein and host pre-mRNA metabolism. PLoS Pathog. 2015 Feb 24;11(2):e1004652. doi: 10.1371/journal.ppat.1004652. eCollection 2015 Feb
  • Chen B, Yun J, Kim MS, Mendell JT, Xie Y*. PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis. Genome Biology. 2014 Jan 22;15(1):R18. PMID: 24451213.
  • Wang T, Xie Y and Xiao G*, dCLIP: a computational approach for comparative CLIP-seq analyses, Genome Biology, 2014, 15:R11 doi:10.1186/gb-2014-15-1-r11
  • Wang T, Xiao G, Chu Y, Zhang MQ, Corey DR, Xie Y*. Design and bioinformatics analysis of genome-wide CLIP experiments. Nucleic Acids Res. 2015 May 9. pii: gkv439. [Epub ahead of print]
  • Wang T, Chen B, Kim M, Xie Y, Xiao G. A model-based approach to identify binding sites in CLIP-Seq data. PLoS ONE. 2014 Apr 8;9(4):e93248. doi: 10.1371/journal.pone.0093248. eCollection 2014. PMID: 24714572
Read more

Tissue imaging data analysis

Currently, my research is mainly focused on tissue image analysis. Our team was among the first to develop and validate computational models using tissue images collected from routine clinical procedures to refine lung cancer prognosis (Luo et al, 2017, Luo et al, 2019). We have developed deep-learning-based models to detect tumor regions and micro-blood-vessels and to predict patient outcomes (Wang et al, 2018; Yi et al, 2018; Huang et al, 2017). Recently, we developed algorithms to detect and classify different types of cells from tissue images (Wang et al, 2019; Wang et al, 2020), and developed a set of spatial models (Li et al, 2019, Li et al, 2020) to investigate cell spatial organization and its implications in disease. I am the PI of a CPRIT (Cancer Prevention Institute of Texas) grant focusing on analyzing digital pathology data to improve lung cancer patient care. Our study was highlighted on the cover page of the May 2020 issue of Cancer Research.


  • Yi F, Huang J, Yang L, Xie Y, Xiao G. Automatic extraction of cell nuclei from H&E-stained histopathological images. Journal of Medical Imaging. 2017 Apr;4(2):027502. PMID: 28653017
  • Yi F, Yang L, Wang S, Lei G, Huang C; Xie Y, Xiao G*. Micro-vessel Prediction in H&E Stained Pathology Images using Fully Convolutional Neural Networks. BMC Bioinformatics 2018, 19:64. PMC5828328.
  • Wang S, Wang T, Yang L, Yang DM, Fujimoto J, Yi F, Luo X, Yang Y, Yao B, Lin S, Moran C, Kalhor N, Weissferdt A, Minna J, Xie Y, Wistuba II, Mao Y, Xiao G. ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network. EBioMedicine. 2019, 50:103-110. PMID: 31767541.
  • Wang S, Rong R, Yang DM, Fujimoto J, Yan S, Cai L, Yang L, Luo D, Behrens C, Parra ER, Yao B, Xu L, Wang T, Wistuba, II, Minna J, Xie Y, Xiao G*. Computational staining of pathology images to study the tumor microenvironment in lung cancer. Cancer Research. 2020. PubMed PMID: 31915129.
Read more

Machine learning and deep learning

Our team has developed and validated machine learning algorithms to solve practical biological and clinical problems: (1). In 2007, I applied a machine learning method to identify new blood protein biomarkers that picks up very early stages of Alzheimer’s Disease (AD) at 88 to 96 percent accuracy with much lower cost. This blood-based test is current being validated in several large NIH-funded ongoing clinical trials as a front-line screening test for AD. (2) We developed a new algorithm to identify a gene signature to predict lung cancer patient response to adjuvant chemotherapy (Tang et al, 2013), and validate the signature using a Clinical Laboratory Improvement Amendments (CLIA)-grade assay in an independent cohort (Xie et al, 2018). (3) We developed a machine learning algorithm to predict lung cancer patient prognosis using commonly available pathology images (Luo et al, 2017), and validated the model in multiple independent patient cohorts (Luo et al, 2019).
  • Xiao G, Ma S, Minna J, Xie Y. Adaptive prediction model in prospective molecular-signature-based clinical studies, Clin. Cancer Res. 2014, Feb 1;20(3):531-9. PMC3946561.
  • Luo X, Zang X, Yang L, Huang J, Liang F, Rodriguez Canales J, Wistuba, II, Gazdar A, Xie Y, Xiao G*. Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis. J. Thoracic Oncol. 2017, 12:3, 501–509. PMC5462113.
  • Huang C, Zhang A, Xiao G*. Deep Integrative Analysis for Survival Prediction. Pac Symp Biocomput. 2018; 23:343-352. PMID: 29218895.
  • Wang S, Yang DM, Rong R, Zhan X, Xiao G*. Pathology image analysis using segmentation deep learning algorithms. American Journal of Pathology. 2019, Jun 11. PMC6723214.
Read more

Developing computational algorithms and bioinformatics tools for complex biomedical data

We are actively developing new bioinformatics tools and computational algorithms for big data, such as genome-wide RNAi screening data and next-generation sequencing data. My group is experienced with software development. We have developed several comprehensive web portals, including Lung Cancer Explorer (LCE) and Genomic Regression Analysis of Coordinated Expression (GRACE) (Cai, et al., Nat Communications, 2017), deep learning-based software (ConvPath), online clinical outcome prediction calculators, Galaxy-based software tools (such as PipeCLIP), and R packages. Some of this software can be accessed on my lab website


  • Wang T, Xie Y, Xiao G*. dCLIP: a computational approach for comparative CLIP-seq analyses. Genome Biology. 2014, Jan 7;15(1):R11.7. PMC4054096.
  • Cai L, Li Q, Du Y, Yun J, Xie Y, DeBerardinis R, Xiao G*. Genomic Regression Analysis of Coordinated Expression, Nature Communications. 2017, Dec 19;8(1):2187. PMC5736603.
  • Zhang M, Li Q, Yu D, Yao B, Guo W, Xie Y, Xiao G*. GeNeCK: a web server for gene network construction and visualization. BMC Bioinformatics. 2019 Jan 7;20(1):12. PMID: 30616521.
  • Zhang M, Sheffield T, Zhan X, Li Q, Yang DM, Wang Y, Wang S, Xie Y, Wang T, Xiao G*. Spatial molecular profiling: platforms, applications and analysis tools. Briefings in Bioinformatics. 2020; In Press.
Read more

Developing spatial models for biological data

We have developed computational methodologies for Bayesian analysis, spatial modeling, and integrative analysis of different biological datasets, especially pathological imaging data.
  • Zhong R, Kim M, White M, Yang X, Xiao G*. Spatial background noise correction for high-throughput RNAi screening. Bioinformatics 2013, 29(17):2218-20. PMC3740628.
  • Yu D, Won S, Lim J, Xiao G*. Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks. Biostatistics, 2015, doi:10.1093/biostatistics/kxv013. PMC4570579.
  • Li Q, Wang X, Liang F, Yi F, Xie Y, Gazdar A, Xiao G*. A Bayesian hidden Potts mixture model for analyzing lung cancer pathology images. Biostatistics 2018, May 18. doi: 10.1093/biostatistics/kxy019. PMC6797059.
  • Li Q, Wang X, Liang F, Xiao G*, A Bayesian mark interaction model for analysis of tumor pathology images, The Annals of Applied Statistics, 2019 13 (3), 1708-1732.
Read more

Biomedical imaging analysis

Our team has a longstanding interest and experience in biomedical image analysis. We have developed new methods and performance analysis in brain images, such as fMRI and DTI images.
  • Lu H, Yezhuvath US, Xiao G. Improving fMRI sensitivity by normalization of basal physiologic state. Hum Brain Mapp. 2010;31(1):80-7. PubMed PMID: 19585589; PMCID: PMC2797559.
  • Aslan S, Huang H, Uh J, Mishra V, Xiao G, van Osch MJ, Lu H. White matter cerebral blood flow is inversely correlated with structural and functional connectivity in the human brain. Neuroimage. 2011;56(3):1145-53. PubMed PMID: 21385618; PMCID: PMC3085605.
  • Tung KC, Uh J, Mao D, Xu F, Xiao G, Lu H. Alterations in resting functional connectivity due to recent motor task. Neuroimage. 2013;78:316-24. PubMed PMID: 23583747; PMCID: PMC3672369.
  • Liu P, Dimitrov I, Andrews T, Crane DE, Dariotis JK, Desmond J, Dumas J, Gilbert G, Kumar A, Maclntosh BJ, Tucholka A, Yang S, Xiao G, Lu H. Multisite evaluations of a T2 -relaxation-under-spin-tagging (TRUST) MRI technique to measure brain oxygenation. Magn Reson Med. 2016;75(2):680-7. PubMed PMID: 25845468; PMCID: PMC4592780.
Read more