Welcome to Xie Lab

Short Biography

Dr. Yang Xie holds the Raymond D. and Patsy R. Nasher Distinguished Chair in Cancer Research and is the Associate Dean for Data Sciences at UT Southwestern Medical Center. She is the founding director of the Quantitative Biomedical Research Center (QBRC), the Pediatric Cancer Data Commons (PCDC), and the Cancer Center Data Science Shared Resources (DSSR) at the Harold C. Simmons Comprehensive Cancer Center. Dr. Yang Xie received her training in biostatistics, medicine and epidemiology. Her research lab focuses on medical informatics, developing predictive and prognostic biomarkers, and precision medicine. She is currently the PI of an NIH Maximizing Investigators' Research Award (MIRA) grant, MPI of an NIAID U01 grant and PI of the Pediatric Cancer Data Core at CPRIT.

Read More

Education
  • Ph.D. in Biostatistics, 2006

    University of Minnesota-Twin Cities

  • M.S. in Biostatistics, 2002

    University of Minnesota-Twin Cities

  • M.S in Epidemiology, 2000

    Peking Union Medical College

  • BMedSc 1997

    Peking University Health Science Center


Research Summary

Biomarker Discovery and Clinical Outcome Prediction

We have developed computational models to predict patient outcomes, allowing clinicians to tailor treatment plans for individual patients ...

Learn more

Methods for High-dimension Data and Integrative Analysis

We have developed bioinformatics tools, computational algorithms and statistical methodologies for the processing and analysis of high dimensional ...

Learn more

Statistical Learning and Prediction Model

My lab has extensive experience in develop predictive models in biomedical research. Our team has won several highly competitive international computational challenges ...

Learn more

RNA Regulation

Over the past couple of decades, a surge of discoveries have revealed RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages. ...

Learn more
Research Interests
  • Biomarker discovery and validation

  • Genomic data analysis and data integration

  • Medical informatics

  • Clinical trial design

  • Lung cancer

Selected Publications

A complete publication list can be found here.

MORE PUBLICATIONS
Predicting the future for people with lung cancer

Nature Medicine. 2008 Aug;14(8):812-3. PubMed PMID: 18685594; PubMed Central PMCID: PMC2833359.
Xie Y, Minna JD. Predicting the future for people with lung cancer.

publisher's website

A lung cancer molecular prognostic test ready for prime time

Lancet, 379, 785-787.
Xie, Y., and Minna, J. D.

publisher's website

A 12-Gene Set Predicts Survival Benefits from Adjuvant Chemotherapy in Non-Small-Cell Lung Cancer Patients

Clin Cancer Res; January 28, 2013; doi:10.1158/1078-0432.CCR-12-2321. (*Corresponding Author)
Tang H, Xiao G, Behrens C, Schiller J, Allen J, Chow CW, Suraokar M, Corvalan A, Mao JH, White M, Wistuba II, Minna J, Xie, Y.

publisher's website

Research

As director of the QBRC (Quantitative Biomedical Research Center), I am proud to support multiple research labs for interdisciplinary biological research and to provide online tools and packages for biological research. Our team is interested in developing computational models to predict patient outcomes, which will allow clinicians to tailor treatment plans for individual patients. We have also developed gene expression signatures to predict patient prognosis and response to chemotherapy.

Research Projects


  • image-responsive

    Cancer biomarkers

    We have developed gene expression signatures to predict patient prognosis and response to chemotherapy.

    We are interested in developing computational models to predict patient outcomes, which will allow clinicians to tailor treatment plans for individual patients. We have developed gene expression signatures to predict patient prognosis and response to chemotherapy. Using an innovative computational and systems biology approach, we identified a set of 12 genes that predicts which patients are most likely to benefit from post-surgery chemotherapy (Patent #UTSD2627). Working together with investigators at both UT Southwestern Medical Center and MD Anderson Cancer Center, we are developing a Clinical Laboratory Improvement Amendments (CLIA) -certifiable medical device and designing a prospective clinical study to translate our 12-gene predictive signature to clinical use. Related publications can be found below:

    • Xie Y, Minna JD. Non-small-cell lung cancer mRNA expression signature predicting response to adjuvant chemotherapy. J Clin Oncol. 2010 Oct 10;28(29):4404-7. PMID: 20823415
    • Xie, Y., Xiao, G., Coombes, K. R., Behrens, C., Solis, L. M., Raso, G., Girard, L., Erickson, H. S., Roth, J., Heymach, J. V., Moran, C., Danenberg, K., Minna, J. D., and Wistuba, II. (2011) Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients, Clinical Cancer Research 17, 5705-5714.
    • Tang, H., Xiao, G., Behrens, C., Schiller, J., Allen, J., Chow, C. W., Suraokar, M., Corvalan, A., Mao, J., White, M. A., Wistuba, II, Minna, J. D., and Xie, Y*. (2013) A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients, Clin Cancer Res 19, 1577-1586.
    • Tang, H., Sebti, S., Titone, R., Zhou, Y., Isidoro, C., Ross, T., Hibshoosh, H., Xiao, G., Packer, M., Xie, Y.*, and Levine, B. Decreased BECN1 mRNA Expression in Human Breast Cancer is Associated with Estrogen Receptor-Negative Subtypes and Poor Prognosis, EBioMedicine, 2015, 2(3), 255–263
    • Tang H, Wang S, Xiao G, Schiller J, Papadimitrakopoulou V., Minna J, Wistuba I.I., Xie Y.. Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies. Annals of Oncology. 2017 Apr 1;28(4):733-740.
  • image-responsive

    Statistical method for high-dimension data and integrative analysis

    Statistical methodologies for spatial modeling and integrative analysis of different molecular profiling datasets

    We are actively developing new bioinformatics tools and computational algorithms for big data, such as genome-wide RNAi screening data and next-generation sequencing data. We are also developing statistical methodologies for spatial modeling and integrative analysis of different molecular profiling datasets.

    • Xie Y*, Wang X, Story M. Statistical Methods of Background Correction for Illumina BeadArray. Bioinformatics, 2009, Feb 4. PMID: 19193732 PMCID: PMC2654805
    • Xie Y*, JK, Pan W, Xiao G, Khodursky A. A Bayesian Approach to Joint Modeling of Protein-DNA Binding, Gene Expression and Sequence Data. Stat Med. 2010 Feb 20;29(4):489-503.PMID:20049751.
    • Zhong R, Kim M, White M, Xie Y, Xiao G*, Spatial Background Noise Correction for High-Throughput RNAi Screening, Bioinformatics, 2013 Sep 1;29(17):2218-20
    • Zhong R, Kim J, Kim H, Kim M, Lum L, Levine L, Xiao G, White M, Xie Y*. Computational Detection and Suppression of Sequence-specific Off-target Phenotypes from Whole Genome RNAi Screens. 2014, Nucleic Acid Research. Jul;42(13):8214-22.
  • image-responsive

    Statistical learning and prediction models

    My lab has extensive experience in developing predictive models in biomedical research.

    Our team previously won the highly competitive 2012 NCI-DREAM Drug Sensitivity Prediction Challenge (Bansal at al Nature Biotechnology, 2014), the 2013 NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge (Eduati et al Nature Biotechnology, 2015), and co-won the 2014 DREAM-Broad Institute Gene Essentiality Challenge.

    • Xiao G, Ma S, Minna J, Xie Y*, Adaptive prediction model in prospective molecular-signature-based clinical studies, Clinical Cancer Research, 2014 Feb 1;20(3):531-9. PMID:24323903.
    • Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser L, Realubit R, Mattioli M, Alvarez M1,2, Shen Y, NCI-DREAM community, Gallahan D, Singer D, Saez-Rodriguez J, Xie Y*, Stolovitzky G*, Califano A*, Predicting activity of drug combinations through crowdsourcing, Nature Biotechnology, 2014 Dec;32(12):1213-22.
    • Eduati, F.#, Mangravite, L.#, Wang, T.#, Tang, H.#, Bare, C., Huang, R., Norman, T., Kellen, M., Menden, M., Yang, J., Zhan, X., Zhong, R., Xiao, G., Xia, M., the NIEHS-NATS-UNC DREAM Toxicogenetics Collaboration, Friend, S., Dearry, A., Simeonov, A., Tice, R., Rusyn, I., Wright, F., Stolovitzky, G., Xie, Y.*, and Saez-Rodriguez, J.* Opportunities and limitations in the prediction of population responses to toxic compounds assessed through a collaborative competition, Nature Biotechnology 2015, 33, 933–940
  • image-responsive

    RNA regulation

    We have developed several statistical models to analyze such datasets.

    My research interest in gene regulation started 10 years ago when ChIP-chip technology was invented to study genome-wide transcription factor binding sites, and I developed several statistical models to analyze such datasets. Motivated by several joint projects in post-transcriptional RNA regulation, I developed a strong interest in understanding the role of RNA-binding proteins (RBPs) in RNA regulation. To better analyze the genome-wide CLIP data, my lab has develop several bioinformatics tools and analysis algorithms for CLIP-seq data.

    • Han T, Kato M, Xie S, Wu L, Mirzaei H, Pei J, Chen M, Xie Y, Allen J, Xiao G, McKnight S. Cell-free Formation of RNA Granules: Bound RNAs Identify Features and Components of Cellular Assemblies. Cell 11 May 2012 (Vol. 149, Issue 4, pp. 768-779)
    • Kwon I, Xiang S, Kato M, Wu L, Theodoropoulos P, Wang T, Kim J, Yun J, Xie Y, McKnight SL, Poly-dipeptides encoded by the C9ORF72 repeats bind nucleoli, impede RNA biogenesis, and kill cells, Science 2014, Published online 31 July 2014 [DOI:10.1126/science.1254917]
    • Chu Y, Wang T, Dodd D, Xie Y, Janowski BA, Corey DR. Intramolecular circularization increases efficiency of RNA sequencing and enables CLIP-Seq of nuclear RNA from human cells. Nucleic Acids Res. 2015 Mar 26. pii: gkv213. [Epub ahead of print] PMID: 25813040
    • Sei E, Wang T, Hunter OV, Xie Y*, Conrad NK*. HITS-CLIP analysis uncovers a link between the Kaposi's sarcoma-associated herpesvirus ORF57 protein and host pre-mRNA metabolism. PLoS Pathog. 2015 Feb 24;11(2):e1004652. doi: 10.1371/journal.ppat.1004652. eCollection 2015 Feb
    • Chen B, Yun J, Kim MS, Mendell JT, Xie Y*. PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis. Genome Biology. 2014 Jan 22;15(1):R18. PMID: 24451213.
    • Wang T, Xie Y and Xiao G*, dCLIP: a computational approach for comparative CLIP-seq analyses, Genome Biology, 2014, 15:R11 doi:10.1186/gb-2014-15-1-r11
    • Wang T, Xiao G, Chu Y, Zhang MQ, Corey DR, Xie Y*. Design and bioinformatics analysis of genome-wide CLIP experiments. Nucleic Acids Res. 2015 May 9. pii: gkv439. [Epub ahead of print]
    • Wang T, Chen B, Kim M, Xie Y, Xiao G. A model-based approach to identify binding sites in CLIP-Seq data. PLoS ONE. 2014 Apr 8;9(4):e93248. doi: 10.1371/journal.pone.0093248. eCollection 2014. PMID: 24714572

Publications

Filter by year: All years/2023-2008.
A complete publication list can be found here.

Filter by year:

Deep learning of cell spatial organizations identifies clinically relevant insights in tissue images

Shidan Wang, Ruichen Rong, Qin Zhou, Donghan M Yang, Xinyi Zhang, Xiaowei Zhan, Justin Bishop, Zhikai Chi, Clare J Wilhelm, Siyuan Zhang, Curtis R Pickering, Mark G Kris, John Minna, Yang Xie, Guanghua Xiao
Dec. 2023 Nature Communications 14 (1), 7872

Osteosarcoma Explorer: A Data Commons With Clinical, Genomic, Protein, and Tissue Imaging Data for Osteosarcoma Research

Donghan M Yang, Qinbo Zhou, Lauren Furman-Cline, Xian Cheng, Danni Luo, Hongyin Lai, Yueqi Li, Kevin W Jin, Bo Yao, Patrick J Leavey, Dinesh Rakheja, Tammy Lo, David Hall, Donald A Barkauskas, David S Shulman, Katherine Janeway, Chand Khanna, Richard Gorlick, Christopher Menzies, Xiaowei Zhan, Guanghua Xiao, Stephen X Skapek, Lin Xu, Laura J Klesse, Brian D Crompton, Yang Xie
Nov. 2023 JCO Clinical Cancer Informatics 7, e2300104

Unsupervised domain adaptation for nuclei segmentation: Adapting from hematoxylin & eosin stained slides to immunohistochemistry stained slides using a curriculum approach

Shidan Wang, Ruichen Rong, Zifan Gu, Junya Fujimoto, Xiaowei Zhan, Yang Xie, Guanghua Xiao
Nov. 2023 Computer Methods and Programs in Biomedicine 241, 107768

Mitochondrial-encoded complex I impairment induces a targetable dependency on aerobic fermentation in Hürthle cell carcinoma of the thyroid

Anderson R Frank, Vicky Li, Spencer D Shelton, Jiwoong Kim, Gordon M Stott, Leonard M Neckers, Yang Xie, Noelle S Williams, Prashant Mishra, David G McFadden
June 2023 Cancer Discovery, OF1-OF20

Systems and methods for characterizing a tumor microenvironment using pathological images

Guanghua Xiao, Yang Xie, Ruichen Rong, Shidan Wang
June 2023 US Patent App. 17/998,037

A comparative study of neuroendocrine heterogeneity in small cell lung cancer and neuroblastoma

Ling Cai, Ralph J DeBerardinis, Yang Xie, John D Minna, Guanghua Xia
May 2023 Molecular Cancer Research, OF1-OF13

Dissecting molecular, pathological, and clinical features associated with tumor neural/neuroendocrine heterogeneity

Ling Cai, Ralph J DeBerardinis, Guanghua Xiao, John D Minna, Yang Xie
May 2023 iScience

Novel start codons introduce novel coding sequences in the human genomes

He Zhang, Yang Xie
May 2023 Scientific Reports 13 (1), 8141

A lung cancer cell line explorerAuthors

Ling Cai, Luc Girard, Ralph DeBerardinis, Guanghua Xiao, John Minna, Yang Xie
Apr. 2023 Cancer Research 83 (7_Supplement), 6572-6572

Enhanced Pathology Image Quality with Restore–Generative Adversarial Network

Ruichen Rong, Shidan Wang, Xinyi Zhang, Zhuoyu Wen, Xian Cheng, Liwei Jia, Donghan M Yang, Yang Xie, Xiaowei Zhan, Guanghua Xiao
Apr. 2023 The American Journal of Pathology 193 (4), 404-416

Deep learning in digital pathology for personalized treatment plans of cancer patients

Zhuoyu Wen, Shidan Wang, Donghan M Yang, Yang Xie, Mingyi Chen, Justin Bishop, Guanghua Xiao
Mar. 2023 Seminars in Diagnostic Pathology 40 (2), 109-119

Global chromatin landscapes identify candidate noncoding modifiers of cardiac rhythm

Samadrita Bhattacharyya, Rahul K Kollipara, Gabriela Orquera-Tornakian, Sean Goetsch, Minzhe Zhang, Cameron Perry, Boxun Li, John M Shelton, Minoti Bhakta, Jialei Duan, Yang Xie, Guanghua Xiao, Bret M Evers, Gary C Hon, Ralf Kittler, Nikhil V Munshi
Feb. 2023 The Journal of clinical investigation 133 (3)

Features of tumor-microenvironment images predict targeted therapy survival benefit in patients with EGFR-mutant lung cancer

Shidan Wang, Ruichen Rong, Donghan M Yang, Junya Fujimoto, Justin A Bishop, Shirley Yan, Ling Cai, Carmen Behrens, Lynne D Berry, Clare Wilhelm, Dara Aisner, Lynette Sholl, Bruce E Johnson, David J Kwiatkowski, Ignacio I Wistuba, Paul A Bunn, John Minna, Guanghua Xiao, Mark G Kris, Yang Xie
Jan 2023 The Journal of clinical investigation 133 (2)

T-cell tolerant fraction as a predictor of immune-related adverse events

Jared Ostmeyer, Jason Y Park, Mitchell S Von Itzstein, David Hsiehchen, Farjana Fattah, Mary Gwin, Rodrigo Catalan, Shaheen Khan, Prithvi Raj, Edward K Wakeland, Yang Xie, David E Gerber
2023 Journal for Immunotherapy of Cancer 11 (8)

Association between immune-related adverse event timing and treatment outcomes

David Hsiehchen, Abdul Rafeh Naqash, Magdalena Espinoza, Mitchell S Von Itzstein, Alessio Cortellini, Biagio Ricciuti, Dwight H Owen, Mehak Laharwal, Yukihiro Toi, Michael Burke, Yang Xie, David E Gerber
Dec. 2022 Oncoimmunology 11 (1), 2017162

A Pan-Cancer Assessment of RB1/TP53 Co-Mutations

Ling Cai, Ralph J DeBerardinis, Guanghua Xiao, John D Minna, Yang Xie
Aug. 2022 Cancers 14 (17), 4199

TK216 targets microtubules in Ewing sarcoma cells

Juan Manuel Povedano, Vicky Li, Katherine E Lake, Xin Bai, Rameshu Rallabandi, Jiwoong Kim, Yang Xie, Jef K De Brabander, David G McFadden
Aug. 2022 Cell Chemical Biology 29 (8), 1325-1332. e4

Sprod for de-noising spatially resolved transcriptomics data based on position and image information

Yunguan Wang, Bing Song, Shidan Wang, Mingyi Chen, Yang Xie, Guanghua Xiao, Li Wang, Tao Wang
Aug. 2022 Nature methods 19 (8), 950-958

Deep learning of rhabdomyosarcoma pathology images for classification and survival outcome prediction

Xinyi Zhang, Shidan Wang, Erin R Rudzinski, Saloni Agarwal, Ruichen Rong, Donald A Barkauskas, Ovidiu Daescu, Lauren Furman Cline, Rajkumar Venkatramani, Yang Xie, Guanghua Xiao, Patrick Leavey
June 2022 The American Journal of Pathology 192 (6), 917-925

Transforming activity of an oncoprotein-encoding circular RNA from human papillomavirus (vol 10, 2300, 2019)

Jiawei Zhao, Eunice E Lee, Jiwoong Kim, Rong Yang, Bahir Chamseddin, Chunyang Ni, Elona Gusho, Yang Xie, Cheng-Ming Chiang, Michael Buszczak, Xiaowei Zhan, Laimonis Laimins, Richard C Wang
May 2022 NATURE COMMUNICATIONS 13 (1)

BepiTBR: TB reciprocity enhances B cell epitope prediction

James Zhu, Anagha Gouru, Fangjiang Wu, Jay A Berzofsky, Yang Xie, Tao Wang
Feb. 2022 Iscience 25 (2)

Lung Cancer Computational Biology and Resources

Ling Cai, Guanghua Xiao, David Gerber, Y Xie
Feb. 2022 Cold Spring Harbor Perspectives in Medicine 12 (2), a038273-a038273

Author Correction: Transforming activity of an oncoprotein-encoding circular RNA from human papillomavirus

Jiawei Zhao, Eunice E Lee, Jiwoong Kim, Rong Yang, Bahir Chamseddin, Chunyang Ni, Elona Gusho, Yang Xie, Cheng-Ming Chiang, Michael Buszczak, Xiaowei Zhan, Laimonis Laimins, Richard C Wang
2022 Nature communications 13

MetaPrism: A Versatile Toolkit for Joint Taxa/Gene Analysis of Metagenomic Sequencing Data

Kim J, Jiang S, Wang Y, Xiao G, Xie Y, Liu DJ, Li Q, Koh A, Zhan X
Mar 2021 G3 (Bethesda). doi: 10.1093/g3journal/jkab046. Online ahead of print. PMID: 33713107

Abstract

In microbiome research, metagenomic sequencing generates enormous amounts of data. These data are typically classified into taxa for taxonomy analysis, or into genes for functional analysis. However, a joint analysis where the reads are classified into taxa-specific genes is often overlooked. To enable the analysis of this biologically meaningful feature, we developed a novel bioinformatic toolkit, MetaPrism, which can analyze sequence reads for a set of joint taxa/gene analyses: 1) classify sequence reads and estimate the abundances for taxa-specific genes; 2) tabularize and visualize taxa-specific gene abundances; 3) compare the abundances between groups, and 4) build prediction models for clinical outcome. We illustrated these functions using a published microbiome metagenomics dataset from patients treated with immune checkpoint inhibitor therapy and showed the joint features can serve as potential biomarkers to predict therapeutic responses. MetaPrism is a toolkit for joint taxa and gene analysis. It offers biological insights on the taxa-specific genes on top of the taxa-alone or gene-alone analysis.
MetaPrism is open-source software and freely available at https://github.com/jiwoongbio/MetaPrism. The example script to reproduce the manuscript is also provided in the above code repository.

Evaluating short-term forecasting of COVID-19 cases among different epidemiological models under a Bayesian framework

Li Q, Bedi T, Lehmann CU, Xiao G, Xie Y
Feb 2021 Gigascience. 10(2):giab009. doi: 10.1093/gigascience/giab009. PMID: 33604654

Abstract

Background

Forecasting of COVID-19 cases daily and weekly has been one of the challenges posed to governments and the health sector globally. To facilitate informed public health decisions, the concerned parties rely on short-term daily projections generated via predictive modeling. We calibrate stochastic variants of growth models and the standard susceptible-infectious-removed model into 1 Bayesian framework to evaluate and compare their short-term forecasts.

Results

We implement rolling-origin cross-validation to compare the short-term forecasting performance of the stochastic epidemiological models and an autoregressive moving average model across 20 countries that had the most confirmed COVID-19 cases as of August 22, 2020. Conclusion

None of the models proved to be a gold standard across all regions, while all outperformed the autoregressive moving average model in terms of the accuracy of forecast and interpretability.

MB-GAN: Microbiome Simulation via Generative Adversarial Network

Rong R, Jiang S, Xu L, Xiao G, Xie Y, Liu DJ, Li Q, Zhan X
2020 Gigascience. 10(2):giab005. doi: 10.1093/gigascience/giab005. PMID: 33543271

Abstract

Background

Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models.

Results

To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently.

Conclusions

By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.

A deep learning-based model for screening and staging pneumoconiosis

Zhang L, Rong R, Li Q, Yang DM, Yao B, Luo D, Zhang X, Zhu X, Luo J, Liu Y, Yang X, Ji X, Liu Z, Xie Y, Sha Y, Li Z, Xiao G
Jan 2021 Sci Rep. 11(1):2201. doi: 10.1038/s41598-020-77924-z. PMID: 33500426

Abstract

This study aims to develop an artificial intelligence (AI)-based model to assist radiologists in pneumoconiosis screening and staging using chest radiographs. The model, based on chest radiographs, was developed using a training cohort and validated using an independent test cohort. Every image in the training and test datasets were labeled by experienced radiologists in a double-blinded fashion. The computational model started by segmenting the lung field into six subregions. Then, convolutional neural network classification model was used to predict the opacity level for each subregion respectively. Finally, the diagnosis for each subject (normal, stage I, II, or III pneumoconiosis) was determined by summarizing the subregion-based prediction results. For the independent test cohort, pneumoconiosis screening accuracy was 0.973, with both sensitivity and specificity greater than 0.97. The accuracy for pneumoconiosis staging was 0.927, better than that achieved by two groups of radiologists (0.87 and 0.84, respectively). This study develops a deep learning-based model for screening and staging of pneumoconiosis using man-annotated chest radiographs. The model outperformed two groups of radiologists in the accuracy of pneumoconiosis staging. This pioneer work demonstrates the feasibility and efficiency of AI-assisted radiography screening and diagnosis in occupational lung diseases.

Tumor-suppressor function of Beclin 1 in breast cancer cells requires E-cadherin

Wijshake T, Zou Z, Chen B, Zhong L, Xiao G, Xie Y, Doench JG, Bennett L, Levine B
Feb 2021 Proc Natl Acad Sci U S A. 118(5):e2020478118. doi: 10.1073/pnas.2020478118. PMID: 33495338

Abstract

Beclin 1, an autophagy and haploinsufficient tumor-suppressor protein, is frequently monoallelically deleted in breast and ovarian cancers. However, the precise mechanisms by which Beclin 1 inhibits tumor growth remain largely unknown. To address this question, we performed a genome-wide CRISPR/Cas9 screen in MCF7 breast cancer cells to identify genes whose loss of function reverse Beclin 1-dependent inhibition of cellular proliferation. Small guide RNAs targeting CDH1 and CTNNA1, tumor-suppressor genes that encode cadherin/catenin complex members E-cadherin and alpha-catenin, respectively, were highly enriched in the screen. CRISPR/Cas9-mediated knockout of CDH1 or CTNNA1 reversed Beclin 1-dependent suppression of breast cancer cell proliferation and anchorage-independent growth. Moreover, deletion of CDH1 or CTNNA1 inhibited the tumor-suppressor effects of Beclin 1 in breast cancer xenografts. Enforced Beclin 1 expression in MCF7 cells and tumor xenografts increased cell surface localization of E-cadherin and decreased expression of mesenchymal markers and beta-catenin/Wnt target genes. Furthermore, CRISPR/Cas9-mediated knockout of BECN1 and the autophagy class III phosphatidylinositol kinase complex 2 (PI3KC3-C2) gene, UVRAG, but not PI3KC3-C1–specific ATG14 or other autophagy genes ATG13, ATG5, or ATG7, resulted in decreased E-cadherin plasma membrane and increased cytoplasmic E-cadherin localization. Taken together, these data reveal previously unrecognized cooperation between Beclin 1 and E-cadherin–mediated tumor suppression in breast cancer cells.

Sorting nexin 5 mediates virus-induced autophagy and immunity

Dong X, Yang Y, Zou Z, Zhao Y, Ci B, Zhong L, Bhave M, Wang L, Kuo YC, Zang X, Zhong R, Aguilera ER, Richardson RB, Simonetti B, Schoggins JW, Pfeiffer JK, Yu L, Zhang X, Xie Y, Schmid SL, Xiao G, Gleeson PA, Ktistakis NT, Cullen PJ, Xavier RJ, Levine B
Jan 2021 Nature. 589(7842):456-461. doi: 10.1038/s41586-020-03056-z. Epub 2020 Dec 16. PMID: 33328639

Abstract

Autophagy, a process of degradation that occurs via the lysosomal pathway, has an essential role in multiple aspects of immunity, including immune system development, regulation of innate and adaptive immune and inflammatory responses, selective degradation of intracellular microorganisms, and host protection against infectious diseases1,2. Autophagy is known to be induced by stimuli such as nutrient deprivation and suppression of mTOR, but little is known about how autophagosomal biogenesis is initiated in mammalian cells in response to viral infection. Here, using genome-wide short interfering RNA screens, we find that the endosomal protein sorting nexin 5 (SNX5)3,4 is essential for virus-induced, but not for basal, stress- or endosome-induced, autophagy. We show that SNX5 deletion increases cellular susceptibility to viral infection in vitro, and that Snx5 knockout in mice enhances lethality after infection with several human viruses. Mechanistically, SNX5 interacts with beclin 1 and ATG14-containing class III phosphatidylinositol-3-kinase (PI3KC3) complex 1 (PI3KC3-C1), increases the lipid kinase activity of purified PI3KC3-C1, and is required for endosomal generation of phosphatidylinositol-3-phosphate (PtdIns(3)P) and recruitment of the PtdIns(3)P-binding protein WIPI2 to virion-containing endosomes. These findings identify a context- and organelle-specific mechanism—SNX5-dependent PI3KC3-C1 activation at endosomes—for initiation of autophagy during viral infection.

SMIXnorm: Fast and Accurate RNA-seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples

Yin S, Zhan X, Yao B, Xiao G, Wang X, Xie Y*
Mar 2021 Frontiers in Genetics. 12, 395

Abstract

MIXnorm and SMIXnorm are normalization methods designed for Formalin-Fixed Paraffin-Embedded (FFPE) RNA-sequencing (RNA-seq) data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. SMIXnorm is a simplified version of MIXnorm, which uses a simplified mixture model and requires less computation. We recommend using SMIXnorm for FFPE RNA-seq data normalization for faster computation when the number of samples is larger than 25. Though designed specifically for FFPE RNA-seq data, MIXnorm and SMIXnorm are directly applicable to normalize fresh-frozen (FF) RNA-seq data as well. To obtain the maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration.

Development of a Data Model and Data Commons for Germ Cell Tumors

Ci B, Yang DM, Krailo M, Xia C, Yao B, Luo D, Zhou Q, Xiao G, Xu L, Skapek SX, Murray MM, Amatruda JF, Klosterkemper L, Shaikh F, Faure-Conter C, Fresneau B, Volchenboum SL, Stoneham S, Lopes LF, Nicholson J, Frazier AL, Xie Y*
Jun 2020 JCO Clin Cancer Inform. 4:555-566. PMID: 32568554.

Abstract

Germ cell tumors (GCTs) are considered a rare disease but are the most common solid tumors in adolescents and young adults, accounting for 15% of all malignancies in this age group. The rarity of GCTs in some groups, particularly children, has impeded progress in treatment and biologic understanding. The most effective GCT research will result from the interrogation of data sets from historical and prospective trials across institutions. However, inconsistent use of terminology among groups, different sample-labeling rules, and lack of data standards have hampered researchers’ efforts in data sharing and across-study validation. To overcome the low interoperability of data and facilitate future clinical trials, we worked with the Malignant Germ Cell International Consortium (MaGIC) and developed a GCT clinical data model as a uniform standard to curate and harmonize GCT data sets. This data model will also be the standard for prospective data collection in future trials. Using the GCT data model, we developed a GCT data commons with data sets from both MaGIC and public domains as an integrated research platform. The commons supports functions, such as data query, management, sharing, visualization, and analysis of the harmonized data, as well as patient cohort discovery. This GCT data commons will facilitate future collaborative research to advance the biologic understanding and treatment of GCTs. Moreover, the framework of the GCT data model and data commons will provide insights for other rare disease research communities into developing similar collaborative research platforms.

Experience, Perceptions, and Recommendations Concerning COVID-19-Related Clinical Research Adjustments

Gerber DE, Sheffield TY, Beg MS, Williams EL, Clark VL, Xie Y, Holbein MEB, Skinner CS, Lee SJC
2020 J Natl Compr Canc Netw. 2020:1-8. Epub 2020/10/08. doi: 10.6004/jnccn.2020.7643. PubMed PMID: 33027755.

Abstract

Background: During the COVID-19 public health emergency, the FDA and NIH altered clinical trial requirements to protect participants and manage study conduct. Given their detailed knowledge of research protocols and regular contact with patients, clinicians, and sponsors, clinical research professionals offer important perspectives on these changes. Methods: We developed and distributed an anonymous survey assessing COVID-19–related clinical trial adjustment experiences, perceptions, and recommendations to Clinical Research Office personnel at the Harold C. Simmons Comprehensive Cancer Center. Responses were compared using the Fisher exact test. Results: A total of 94 of 109 contacted research personnel (87%) responded. Among these individuals, 58% had >5 years’ professional experience in clinical research, and 56% had personal experience with a COVID-19–related change. Respondents perceived that these changes had a positive impact on patient safety; treatment efficacy; patient and staff experience; and communication with patients, investigators, and sponsors. More than 90% felt that positive changes should be continued after COVID-19. For remote consent, telehealth, therapy shipment, off-site diagnostics, and remote monitoring, individuals with personal experience with the specific change and individuals with >5 years’ professional experience were numerically more likely to recommend continuing the adjustment, and these differences were significant for telehealth (P=.04) and therapy shipment (P=.02). Conclusions: Clinical research professionals perceive that COVID-19–related clinical trial adjustments positively impact multiple aspects of study conduct. Those with greatest experience—both specific to COVID-19–related changes and more generally—are more likely to recommend that these adjustments continue in the future.

A Multipronged Approach Establishes Covalent Modification of β-Tubulin as the Mode of Action of Benzamide Anti-cancer Toxins

Povedano JM, Rallabandi R, Bai X, Ye X, Liou J, Chen H, Kim J, Xie Y, Posner B, Rice L, De Brabander JK, McFadden DG
2020 J Med Chem. 2020;63(22):14054-66. Epub 2020/11/13. doi: 10.1021/acs.jmedchem.0c01482. PubMed PMID: 33180487; PMCID: PMC7707623.

Abstract

A phenotypic high-throughput screen identified a benzamide small molecule with activity against small cell lung cancer cells. A “clickable” benzamide probe was designed that irreversibly bound a single 50 kDa cellular protein, identified by mass spectrometry as β-tubulin. Moreover, the anti-cancer potency of a series of benzamide analogs strongly correlated with probe competition, indicating that β-tubulin was the functional target. Additional evidence suggested that benzamides covalently modified Cys239 within the colchicine binding site. Consistent with this mechanism, benzamides impaired growth of microtubules formed with β-tubulin harboring Cys239, but not β3 tubulin encoding Ser239. We therefore designed an aldehyde-containing analog capable of trapping Ser239 in β3 tubulin, presumably as a hemiacetal. Using a forward genetics strategy, we identified benzamide-resistant cell lines harboring a Thr238Ala mutation in β-tubulin sufficient to induce compound resistance. The disclosed chemical probes are useful to identify other colchicine site binders, a frequent target of structurally diverse small molecules.

eIF5B drives integrated stress response-dependent translation of PD-L1 in lung cancer

Suresh S, Chen B, Zhu J, Golden RJ, Lu C, Evers BM, Novaresi N, Smith B, Zhan X, Schmid V, Jun S, Karacz CM, Peyton M, Zhong L, Wen Z, Sathe AA, Xing C, Behrens C, Wistuba, II, Xiao G, Xie Y, Fu YX, Minna JD, Mendell JT, O'Donnell KA
2020 Nat Cancer. 2020;1(5):533-45. PubMed PMID: 32984844; PMCID: PMC7511089.

Abstract

Cancer cells express high levels of programmed death ligand 1 (PD-L1), a ligand of the programmed cell death protein 1 (PD-1) receptor on T cells, allowing tumors to suppress T cell activity. Clinical trials utilizing antibodies that disrupt the PD-1/PD-L1 checkpoint have yielded remarkable results, with anti-PD-1 immunotherapy approved as a first-line therapy for patients with lung cancer. We used CRISPR-based screening to identify regulators of PD-L1 in human lung cancer cells, revealing potent induction of PD-L1 upon disruption of heme biosynthesis. Impairment of heme production activates the integrated stress response, allowing bypass of inhibitory upstream open reading frames in the PD-L1 5′ untranslated region, resulting in enhanced PD-L1 translation and suppression of anti-tumor immunity. We demonstrate that integrated stress-response-dependent PD-L1 translation requires the translation initiation factor eIF5B. eIF5B overexpression, which is frequent in lung adenocarcinomas and associated with poor prognosis, is sufficient to induce PD-L1. These findings illuminate mechanisms of immune checkpoint activation and identify targets for therapeutic intervention.

Molecular differences across invasive lung adenocarcinoma morphological subgroups

Bo Ci, Donghan M. Yang, Ling Cai, Lin Yang, Luc Girard, Junya Fujimoto, Ignacio I. Wistuba, Yang Xie, John Minna, William Travis, Guanghua Xiao
2020 Transl Lung Cancer Res. 2020;9(4):1029-40. Epub 2020/09/22. doi: 10.21037/tlcr-19-321. PubMed PMID: 32953482; PMCID: PMC7481608.

Abstract

Background

Lung adenocarcinomas (ADCs) show heterogeneous morphological patterns that are classified into five subgroups: lepidic predominant, papillary predominant, acinar predominant, micropapillary predominant and solid predominant. The morphological classification of ADCs has been reported to be associated with patient prognosis and adjuvant chemotherapy response. However, the molecular mechanisms underlying the morphology differences among different subgroups remain largely unknown.

Methods

Using the molecular profiling data from The Cancer Genome Atlas (TCGA) lung ADC (LUAD) cohort, we studied the molecular differences across invasive ADC morphological subgroups.

Results

We showed that the expression of proteins and mRNAs, but not the gene mutations copy number alterations (CNA), were significantly associated with lung ADC morphological subgroups. In addition, expression of the FOXM1 gene (which is negatively associated with patient survival) likely plays an important role in the morphological differences among different subgroups. Moreover, we found that protein abundance of PD-L1 were associated with the malignancy of subgroups. These results were validated in an independent cohort.

Conclusions

This study provides insights into the molecular differences among different lung ADC morphological subgroups, which could lead to potential subgroup-specific therapies.

Spatial molecular profiling: platforms, applications and analysis tools

Minzhe Zhang, Thomas Sheffield, Xiaowei Zhan, Qiwei Li, Donghan M Yang, Yunguan Wang, Shidan Wang, Yang Xie, Tao Wang, and Guanghua Xiao
Aug 2020 Briefings in Bioinformatics, 2020. Epub 2020/08/10. doi: 10.1093/bib/bbaa145. PubMed PMID: 32770205.

Abstract

Molecular profiling technologies, such as genome sequencing and proteomics, have transformed biomedical research, but most such technologies require tissue dissociation, which leads to loss of tissue morphology and spatial information. Recent developments in spatial molecular profiling technologies have enabled the comprehensive molecular characterization of cells while keeping their spatial and morphological contexts intact. Molecular profiling data generate deep characterizations of the genetic, transcriptional and proteomic events of cells, while tissue images capture the spatial locations, organizations and interactions of the cells together with their morphology features. These data, together with cell and tissue imaging data, provide unprecedented opportunities to study tissue heterogeneity and cell spatial organization. This review aims to provide an overview of these recent developments in spatial molecular profiling technologies and the corresponding computational methods developed for analyzing such data.

Late-Onset Immunotherapy Toxicity and Delayed Autoantibody Changes: Checkpoint Inhibitor-Induced Raynaud's-Like Phenomenon.

Khan S, von Itzstein MS, Lu R, Bermas BL, Karp DR, Khan SA, Fattah FJ, Park JY, Saltarski JM, Gloria-McCutchen Y, Xie Y, Li QZ, Wakeland EK, Gerber DE.
May 2020 Oncologist. 25(5):e753-e757. doi: 10.1634/theoncologist.2019-0666. Epub 2020 Mar 13. PMID: 32167195

Abstract

Immune checkpoint inhibitor (ICI)-induced immune-related adverse events (irAEs) may affect almost any organ system and occur at any point during therapy. Autoantibody analysis may provide insight into the mechanism, nature, and timing of these events. We report a case of ICI-induced late-onset Raynaud's-like phenomenon in a patient receiving combination immunotherapy. A 53-year-old woman with advanced non-small lung cancer received combination anti-cytotoxic T-lymphocyte antigen 4 and anti-programmed death 1 ICI therapy. She developed early (hypophysitis at 4 months) and late (Raynaud's at >20 months) irAEs. Longitudinal assessment of 124 autoantibodies was correlated with toxicity. Although autoantibody levels were generally stable for the first 18 months of therapy, shortly before the development of Raynaud's, a marked increase in multiple autoantibodies was observed. This case highlights the potential for delayed autoimmune toxicities and provides potential biologic insights into the dynamic nature of these events. KEY POINTS: A patient treated with dual anti-PD1 and anti-CTLA4 therapy developed Raynaud's-like signs and symptoms more than 18 months after starting therapy. In this case, autoantibody changes became apparent shortly before onset of clinical toxicity. This case highlights the potential for late-onset immune-related adverse events checkpoint inhibitors, requiring continuous clinical vigilance. The optimal duration of checkpoint inhibitor therapy in patients with profound and prolonged responses remains unclear.

Computational Staining of Pathology Images to Study the Tumor Microenvironment in Lung Cancer.

Wang S, Rong R, Yang DM, Fujimoto J, Yan S, Cai L, Yang L, Luo D, Behrens C, Parra ER, Yao B, Xu L, Wang T, Zhan X, Wistuba II, Minna J, Xie Y, Xiao G.
May 2020 Cancer Res. 80(10):2056-2066. doi: 10.1158/0008-5472.CAN-19-1629. Epub 2020 Jan 8. PMID: 31915129

Abstract

The spatial organization of different types of cells in tumor tissues reveals important information about the tumor microenvironment (TME). To facilitate the study of cellular spatial organization and interactions, we developed Histology-based Digital-Staining, a deep learning-based computation model, to segment the nuclei of tumor, stroma, lymphocyte, macrophage, karyorrhexis, and red blood cells from standard hematoxylin and eosin-stained pathology images in lung adenocarcinoma. Using this tool, we identified and classified cell nuclei and extracted 48 cell spatial organization-related features that characterize the TME. Using these features, we developed a prognostic model from the National Lung Screening Trial dataset, and independently validated the model in The Cancer Genome Atlas lung adenocarcinoma dataset, in which the predicted high-risk group showed significantly worse survival than the low-risk group (P = 0.001), with a HR of 2.23 (1.37-3.65) after adjusting for clinical variables. Furthermore, the image-derived TME features significantly correlated with the gene expression of biological pathways. For example, transcriptional activation of both the T-cell receptor and programmed cell death protein 1 pathways positively correlated with the density of detected lymphocytes in tumor tissues, while expression of the extracellular matrix organization pathway positively correlated with the density of stromal cells. In summary, we demonstrate that the spatial organization of different cell types is predictive of patient survival and associated with the gene expression of biological pathways. SIGNIFICANCE: These findings present a deep learning-based analysis tool to study the TME in pathology images and demonstrate that the cell spatial organization is predictive of patient survival and is associated with gene expression.See related commentary by Rodriguez-Antolin, p. 1912.

Lack of Association Between Radiographic Tumor Burden and Efficacy of Immune Checkpoint Inhibitors in Advanced Lung Cancer.

Popat V, Lu R, Ahmed M, Park JY, Xie Y, Gerber DE.
Mar 2020 Oncologist. doi: 10.1634/theoncologist.2019-0814. Online ahead of print. PMID: 32233048

Abstract

Background

Historically, tumor burden has been considered an impediment to efficacy of immunotherapeutic agents, including vaccines, stem cell transplant, cytokine therapy, and intravesical bacillus Calmette-Guérin. This effect has been attributed to hypoxic zones in the tumor core contributing to poor T-cell infiltration, formation of immunosuppressive stromal cells, and development of therapy-resistant cell populations. However, the association between tumor burden and efficacy of immune checkpoint inhibitors is unknown. We sought to determine the association between radiographic tumor burden parameters and efficacy of immune checkpoint inhibitors in advanced lung cancer.

Materials and methods

We performed a retrospective analysis of patients with advanced lung cancer treated with immune checkpoint inhibitors. Demographic, disease, and treatment data were collected. Serial tumor dimensions were recorded according to RECIST version 1.1. Associations between radiographic tumor burden (baseline sum of longest diameters, longest single diameter) and clinical outcomes (radiographic response, progression-free survival, and overall survival) were determined using log-rank tests, Cox proportional-hazard regression, and logistic regression.

Results

Among 105 patients, the median baseline sum of longest diameters (BSLD) was 6.4 cm; median longest single diameter was 3.6 cm. BSLD was not associated with best radiographic, progression-free survival, or overall survival. In univariate and multivariate analyses, no significant associations were observed for the other radiographic parameters and outcomes when considered as categorical or continuous variables.

Conclusion

Although tumor burden has been considered a mediator of efficacy of earlier immunotherapies, in advanced lung cancer it does not appear to affect outcomes from immune checkpoint inhibitors.

Implications for practice

Historically, tumor burden has been considered an impediment to the efficacy of various immunotherapies, including vaccines, cytokines, allogeneic stem cell transplant, and intravesical bacillus Calmette-Guérin. However, in the present study, no association was found between tumor burden and efficacy (response rate, progression-free survival, overall survival) of immune checkpoint inhibitors in advanced lung cancer. These findings suggest that immune checkpoint inhibitors may provide benefit across a range of disease burden, including bulky tumors considered resistant to other categories of immunotherapy.

Keywords

Burden; Imaging; Immunotherapy; Lung cancer; Outcomes; RECIST.

Closing the gap: Contribution of surgical best practices to outcome differences between high- and low-volume centers for lung cancer resection.

von Itzstein MS, Lu R, Kernstine KH, Halm EA, Wang S, Xie Y, Gerber DE.
Apr 2020 Cancer Med. 2020 Apr 21. doi: 10.1002/cam4.3055. Online ahead of print. PMID: 32319225

Abstract

Background

Clinical outcomes for resected early-stage non-small cell lung cancer (NSCLC) are superior at high-volume facilities, but reasons for these differences remain unclear. Understanding these differences and optimizing outcomes across institutions are critical to the management of the increasing incidence of these cases. We evaluated the extent to which surgical best practices account for resected early-stage NSCLC outcome differences between facilities according to case volume.

Methods

A total of 150,179 patients were included in the cohort (89% white, 53% female, median age 68 years). In a multivariate model, superior overall survival (OS) was observed at highest volume centers compared to lowest volume centers (hazard ratio (HR) = 0.89; 95% CI, 0.82-0.96; P = .002). After matching for surgical best practices, there was no significant OS difference (HR = 0.95; 95% CI, 0.87-1.05; P = .32). Propensity score-adjusted HR estimates indicated that surgical best practices accounted for 54% of the numerical OS difference between low-volume and high-volume centers. Each surgical best practice was independently associated with improved OS (all P ≤ .001).

Conclusion

Quantifiable and potentially modifiable surgical best practices largely account for resected early-stage NSCLC outcome differences observed between low- and high-volume centers. Adherence to these guidelines may reduce and potentially eliminate these differences.

Keywords

National Cancer Database (NCDB); guidelines; lobectomy; thoracic surgery; volume-outcome relationship.

MIXnorm: normalizing RNA-seq data from formalin-fixed paraffin-embedded samples.

Yin S, Wang X, Jia G, Xie Y.
Jun 2020 Bioinformatics. 36(11):3401-3408. doi: 10.1093/bioinformatics/btaa153. PMID: 32134470

Abstract

Motivation

Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance.

Results

We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data.

Availability and implementation

R code available at https://github.com/S-YIN/MIXnorm

Statin Intolerance, Anti-HMGCR Antibodies, and Immune Checkpoint Inhibitor-Associated Myositis: A "Two-Hit" Autoimmune Toxicity or Clinical Predisposition?

von Itzstein MS, Khan S, Popat V, Lu R, Khan SA, Fattah FJ, Park JY, Bermas BL, Karp DR, Ahmed M, Saltarski JM, Gloria-McCutchen Y, Xie Y, Li QZ, Wakeland EK, Gerber DE.
May 2020 Oncologist. doi: 10.1634/theoncologist.2019-0911. Online ahead of print. PMID: 32400023

Abstract

Immune-related adverse events induced by immune checkpoint inhibitor (ICI) therapy may affect diverse organ systems, including skeletal and cardiac muscle. ICI-associated myositis may result in substantial morbidity and occasional mortality. We present a case of a patient with advanced non-small cell lung cancer who developed grade 4 myositis with concurrent myocarditis early after initiation of anti-programmed death ligand 1 therapy (durvalumab). Autoantibody analysis revealed marked increases in anti-3-hydroxy-3-methylglutaryl-coenzyme A reductase antibody levels that preceded clinical toxicity, and further increased during toxicity. Notably, the patient had a history of intolerable statin myopathy, which had resolved clinically after statin discontinuation and prior to ICI initiation. This case demonstrates a potential association between statin exposure, autoantibodies, and ICI-associated myositis.

VAMPr: VAriant Mapping and Prediction of antibiotic resistance via explainable features and machine learning

Kim J, Greenberg DE, Pifer R, Jiang S, Xiao G, Shelburne SA, Koh A, Xie Y, Zhan X.
Jan 2020 PLoS Comput Biol. 16(1):e1007511. doi: 10.1371/journal.pcbi.1007511. eCollection 2020 Jan. PMID: 31929521

Abstract

Antimicrobial resistance (AMR) is an increasing threat to public health. Current methods of determining AMR rely on inefficient phenotypic approaches, and there remains incomplete understanding of AMR mechanisms for many pathogen-antimicrobial combinations. Given the rapid, ongoing increase in availability of high-density genomic data for a diverse array of bacteria, development of algorithms that could utilize genomic information to predict phenotype could both be useful clinically and assist with discovery of heretofore unrecognized AMR pathways. To facilitate understanding of the connections between DNA variation and phenotypic AMR, we developed a new bioinformatics tool, variant mapping and prediction of antibiotic resistance (VAMPr), to (1) derive gene ortholog-based sequence features for protein variants; (2) interrogate these explainable gene-level variants for their known or novel associations with AMR; and (3) build accurate models to predict AMR based on whole genome sequencing data. We curated the publicly available sequencing data for 3,393 bacterial isolates from 9 species that contained AMR phenotypes for 29 antibiotics. We detected 14,615 variant genotypes and built 93 association and prediction models. The association models confirmed known genetic antibiotic resistance mechanisms, such as blaKPC and carbapenem resistance consistent with the accurate nature of our approach. The prediction models achieved high accuracies (mean accuracy of 91.1% for all antibiotic-pathogen combinations) internally through nested cross validation and were also validated using external clinical datasets. The VAMPr variant detection method, association and prediction models will be valuable tools for AMR research for basic scientists with potential for clinical applicability.

Large-Scale Profiling of RBP-circRNA Interactions from Public CLIP-Seq Datasets

Zhang M, Wang T, Xiao G, Xie Y.
Jan 2020 Genes (Basel). 11(1):54. doi: 10.3390/genes11010054. PMID: 31947823

Abstract

Circular RNAs are a special type of RNA that has recently attracted a lot of research interest in studying its formation and function. RNA binding proteins (RBPs) that bind circRNAs are important in these processes, but have been relatively less studied. CLIP-Seq technology has been invented and applied to profile RBP-RNA interactions on the genome-wide scale. While mRNAs are usually the focus of CLIP-Seq experiments, RBP-circRNA interactions could also be identified through specialized analysis of CLIP-Seq datasets. However, many technical difficulties are involved in this process, such as the usually short read length of CLIP-Seq reads. In this study, we created a pipeline called Clirc specialized for profiling circRNAs in CLIP-Seq data and analyzing the characteristics of RBP-circRNA interactions. In conclusion, to our knowledge, this is one of the first studies to investigate circRNAs and their binding partners through repurposing CLIP-Seq datasets, and we hope our work will become a valuable resource for future studies into the biogenesis and function of circRNAs.

Correction: LCE: an open web portal to explore gene expression and clinical associations in lung cancer

Cai L, Lin S, Girard L, Zhou Y, Yang L, Ci B, Zhou Q, Luo D, Yao B, Tang H, Allen J, Huffman K, Gazdar A, Heymach J, Wistuba I, Xiao G, Minna J, Xie Y.
Jan 2020 Oncogene. 39(3):718-719. doi: 10.1038/s41388-019-1000-6. PMID: 31501522

Abstract

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Identification of scavenger receptor B1 as the airway microfold cell receptor for Mycobacterium tuberculosis

Khan HS, Nair VR, Ruhl CR, Alvarez-Arguedas S, Galvan Rendiz JL, Franco LH, Huang L, Shaul PW, Kim J, Xie Y, Mitchell RB, Shiloh MU.
Mar 2020 Elife. 9:e52551. doi: 10.7554/eLife.52551. PMID: 32134383

Abstract

Mycobacterium tuberculosis (Mtb) can enter the body through multiple routes, including via specialized transcytotic cells called microfold cells (M cell). However, the mechanistic basis for M cell entry remains undefined. Here, we show that M cell transcytosis depends on the Mtb Type VII secretion machine and its major virulence factor EsxA. We identify scavenger receptor B1 (SR-B1) as an EsxA receptor on airway M cells. SR-B1 is required for Mtb binding to and translocation across M cells in mouse and human tissue. Together, our data demonstrate a previously undescribed role for Mtb EsxA in mucosal invasion and identify SR-B1 as the airway M cell receptor for Mtb.

Biomarkers for RBM39 degradation in acute myeloid leukemia

Hsiehchen D, Goralski M, Kim J, Xie Y, Nijhawan D.
Feb 2020 Leukemia. doi: 10.1038/s41375-020-0729-9. Online ahead of print. PMID: 32042080

Abstract

Transforming activity of an oncoprotein-encoding circular RNA from human papillomavirus

Zhao J, Lee EE, Kim J, Yang R, Chamseddin B, Ni C, Gusho E, Xie Y, Chiang CM, Buszczak M, Zhan X, Laimins L, Wang RC.
May 2019 Nat Commun. 10(1):2300. doi: 10.1038/s41467-019-10246-5. PMID: 31127091

Abstract

Single-stranded circular RNAs (circRNAs), generated through 'backsplicing', occur more extensively than initially anticipated. The possible functions of the vast majority of circRNAs remain unknown. Virus-derived circRNAs have recently been described in gamma-herpesviruses. We report that oncogenic human papillomaviruses (HPVs) generate circRNAs, some of which encompass the E7 oncogene (circE7). HPV16 circE7 is detectable by both inverse RT-PCR and northern blotting of HPV16-transformed cells. CircE7 is N6-methyladenosine (m6A) modified, preferentially localized to the cytoplasm, associated with polysomes, and translated to produce E7 oncoprotein. Specific disruption of circE7 in CaSki cervical carcinoma cells reduces E7 protein levels and inhibits cancer cell growth both in vitro and in tumor xenografts. CircE7 is present in TCGA RNA-Seq data from HPV-positive cancers and in cell lines with only episomal HPVs. These results provide evidence that virus-derived, protein-encoding circular RNAs are biologically functional and linked to the transforming properties of some HPV.

Artificial Intelligence in Lung Cancer Pathology Image Analysis

Wang S, Yang DM, Rong R, Zhan X, Fujimoto J, Liu H, Minna J, Wistuba II, Xie Y, Xiao G.
Oct 2019 Cancers (Basel). 11(11):1673. doi: 10.3390/cancers11111673. PMID: 31661863

Abstract

Objective

Accurate diagnosis and prognosis are essential in lung cancer treatment selection and planning. With the rapid advance of medical imaging technology, whole slide imaging (WSI) in pathology is becoming a routine clinical procedure. An interplay of needs and challenges exists for computer-aided diagnosis based on accurate and efficient analysis of pathology images. Recently, artificial intelligence, especially deep learning, has shown great potential in pathology image analysis tasks such as tumor region identification, prognosis prediction, tumor microenvironment characterization, and metastasis detection.

Materials and methods

In this review, we aim to provide an overview of current and potential applications for AI methods in pathology image analysis, with an emphasis on lung cancer.

Results

We outlined the current challenges and opportunities in lung cancer pathology image analysis, discussed the recent deep learning developments that could potentially impact digital pathology in lung cancer, and summarized the existing applications of deep learning algorithms in lung cancer diagnosis and prognosis.

Discussion and conclusion

With the advance of technology, digital pathology could have great potential impacts in lung cancer patient care. We point out some promising future directions for lung cancer pathology image analysis, including multi-task learning, transfer learning, and model interpretation.

Keywords

computer-aided diagnosis; deep learning; digital pathology; lung cancer; pathology image; whole-slide imaging.

PUMILIO hyperactivity drives premature aging of Norad-deficient mice

Kopp F, Elguindy MM, Yalvac ME, Zhang H, Chen B, Gillett FA, Lee S, Sivakumar S, Yu H, Xie Y, Mishra P, Sahenk Z, Mendell JT.
Feb 2019 Elife. 8:e42650. doi: 10.7554/eLife.42650. PMID: 30735131

Abstract

Objective

Although numerous long noncoding RNAs (lncRNAs) have been identified, our understanding of their roles in mammalian physiology remains limited. Here, we investigated the physiologic function of the conserved lncRNA Norad in vivo. Deletion of Norad in mice results in genomic instability and mitochondrial dysfunction, leading to a dramatic multi-system degenerative phenotype resembling premature aging. Loss of tissue homeostasis in Norad-deficient animals is attributable to augmented activity of PUMILIO proteins, which act as post-transcriptional repressors of target mRNAs to which they bind. Norad is the preferred RNA target of PUMILIO2 (PUM2) in mouse tissues and, upon loss of Norad, PUM2 hyperactively represses key genes required for mitosis and mitochondrial function. Accordingly, enforced Pum2 expression fully phenocopies Norad deletion, resulting in rapid-onset aging-associated phenotypes. These findings provide new insights and open new lines of investigation into the roles of noncoding RNAs and RNA binding proteins in normal physiology and aging.

Aryl Sulfonamides Degrade RBM39 and RBM23 by Recruitment to CRL4-DCAF15

Ting TC, Goralski M, Klein K, Wang B, Kim J, Xie Y, Nijhawan D.
Nov 2019 Cell Rep. 29(6):1499-1510.e6. doi: 10.1016/j.celrep.2019.09.079. PMID: 31693891

Abstract

Indisulam and related sulfonamides recruit the splicing factor RBM39 to the CRL4-DCAF15 E3 ubiquitin ligase, resulting in RBM39 ubiquitination and degradation. Here, we used a combination of domain mapping and random mutagenesis to identify domains or residues that are necessary for indisulam-dependent RBM39 ubiquitination. DCAF15 mutations at Q232 or D475 prevent RBM39 recruitment by indisulam. RBM39 is recruited to DCAF15 by its RRM2 (RNA recognition motif 2) and is ubiquitinated on its N terminus. RBM23, which is an RBM39 paralog, is also recruited to the CRL4-DCAF15 ligase through its RRM2 domain and undergoes sulfonamide-dependent degradation. Indisulam alters the expression of more than 3,000 genes and causes widespread intron retention and exon skipping. All of these changes can be attributed to RBM39, and none are the consequence of RBM23 degradation. Our findings demonstrate that indisulam selectively degrades RBM23 and RBM39, the latter of which is critically important for splicing and gene expression.

Unique mutation patterns in anaplastic thyroid cancer identified by comprehensive genomic profiling

Khan SA, Ci B, Xie Y, Gerber DE, Beg MS, Sherman SI, Cabanillas ME, Busaidy NL, Burtness BA, Heilmann AM, Bailey M, Ross JS, Sher DJ, Ali SM.
Jun 2019 Head Neck. 41(6):1928-1934. doi: 10.1002/hed.25634. Epub 2019 Feb 13. PMID: 30758123

Abstract

Introduction

Anaplastic thyroid cancer (ATC) is a highly aggressive thyroid cancer. Those ATC with genomic alterations (GAs) in TSC2, ALK, and BRAF may respond to targeted therapies.

Methods

Comprehensive genomic profiling on 90 ATC specimens identified base substitutions, short insertions and deletions, amplifications, copy number alterations, and genomic rearrangements in up to 315 cancer-related genes and 28 genes commonly rearranged in cancer.

Results

Median patient age was 65 (range, 33-86) years, 50 patients were male. There was a mean of 4.2 GA per case, range 1-11. The most common GA were TP53 (66%), BRAF (34%), TERT (32%), CDKN2A (32%), and NRAS (26%). BRAF V600E and NRAS/HRAS/KRAS alteration were mutually exclusive. BRAF, CDKN2A, PIK3CA, and JAK2 were more frequent in patients >70 years of age; while myc, PTEN, and NRAS were more common in those ≤50 years.

Conclusion

ATC shows many GA with potential therapeutic significance and suggesting different molecular pathways can lead to ATC.

Keywords

anaplastic; neoplasms; thyroid.

Systematic Analysis of Gene Expression in Lung Adenocarcinoma and Squamous Cell Carcinoma with a Case Study of FAM83A and FAM83B

Cai L, Luo D, Yao B, Yang DM, Lin S, Girard L, DeBerardinis RJ, Minna JD, Xie Y, Xiao G.
Jun 2019 Cancers (Basel). 11(6):886. doi: 10.3390/cancers11060886. PMID: 31242643

Abstract

Introduction

In our previous study, we constructed a Lung Cancer Explorer (LCE) database housing lung cancer-specific expression data and clinical data from over 6700 patients in 56 studies.

Methods

Using this dataset of the largest collection of lung cancer gene expression along with our meta-analysis method, we systematically interrogated the association between gene expression and overall survival as well as the expression difference between tumor and normal (adjacent non-malignant tissue) samples in lung adenocarcinoma (ADC) and lung squamous cell carcinoma (SQCC). A case study for FAM83A and FAM83B was performed as a demonstration for hypothesis testing with our database.

Results

We showed that the reproducibility of results across studies varied by histological subtype and analysis type. Genes and pathways unique or common to the two histological subtypes were identified and the results were integrated into LCE to facilitate user exploration. In our case study, we verified the findings from a previous study on FAM83A and FAM83B in non-small cell lung cancer.

Conclusion

This study used gene expression data from a large cohort of patients to explore the molecular differences between lung ADC and SQCC.

Keywords

FAM83; gene expression difference between tumor and normal; lung cancer; meta-analysis; survival association analysis; systematic analysis.

Variation in the Assessment of Immune-Related Adverse Event Occurrence, Grade, and Timing in Patients Receiving Immune Checkpoint Inhibitors

Hsiehchen D, Watters MK, Lu R, Xie Y, Gerber DE.
Sep 2019 JAMA Netw Open. 2(9):e1911519. doi: 10.1001/jamanetworkopen.2019.11519. PMID: 31532516

Abstract

Importance

Toxic effects of conventional chemotherapy and molecularly targeted cancer therapies are generally well defined and occur at predictable points. By contrast, owing to their heterogeneous manifestations, unpredictable timing, and clinical overlap with other conditions, immune-related adverse events (irAE) may be more difficult to diagnose and characterize.

Objective

To determine concordance of algorithm-driven medical record review by medical oncologists for the characterization of 8 irAE in patients treated with immune checkpoint inhibitors.

Design, setting, and participants

Cross-sectional study of patients treated with immune checkpoint inhibitors at a National Cancer Institute-designated comprehensive cancer center from November 30, 2015, to March 7, 2018. A sample size of 52 patients provided 80% power to distinguish substantial agreement (κ = 0.85) from poor agreement (κ = 0.5) based on the Cohen κ.

Main outcomes and measures

Interrater agreement of 2 observers in the occurrence and grade of irAE.

Results

Of 52 patients (32 [61.5%] male; mean [SD] age, 69 [9] years) analyzed, 42 (80.8%) had non-small cell lung cancer and all received anti-programmed cell death 1 or anti-programmed cell death ligand 1 antibodies, with 3 patients (5.8%) receiving combinations with anti-cytotoxic T-lymphocyte antigen 4 antibodies. A median (interquartile range) of 82 (47-180) documents were reviewed per case. There was limited or poor interrater agreement on irAE occurrence (Cohen κ, 0.37-0.64), with the exception of hypothyroidism (κ = 0.8). Weighted κ similarly showed limited or poor agreement for irAE grade (κ = 0.31-0.75). Differences in assessed time of onset ranged from 5 to 188 days. As a control for data availability and access, observers had a high degree of agreement for the exact start date (98%) and end date (96%) of immunotherapy administration, suggesting that information interpretation rather than identification largely accounted for assessment differences. In multivariable analysis, therapy duration (adjusted odds ratio, 4.80; 95% CI, 1.34-17.17; P = .02) and Charlson Comorbidity Index (adjusted odds ratio, 4.09; 95% CI, 1.10-15.18; P = .03) were significantly associated with discordant irAE assessment.

Conclusions and relevance

These findings underscore critical challenges in assessing the occurrence, type, timing, and severity of irAE. Apart from hypothyroidism (a condition that has a discrete diagnostic laboratory test and few other likely etiologies during immunotherapy treatment), interobserver agreement was poor. Given the importance of accurate and timely assessment of toxic effects for clinical trials and real-world disease management, efforts to improve irAE diagnosis and characterization are needed.

Immune dysregulation in cancer patients developing immune-related adverse events

Khan S, Khan SA, Luo X, Fattah FJ, Saltarski J, Gloria-McCutchen Y, Lu R, Xie Y, Li Q, Wakeland E, Gerber DE.
Jan 2019 Br J Cancer. 120(1):63-68. doi: 10.1038/s41416-018-0155-1. Epub 2018 Oct 31. PMID: 30377338

Abstract

Background

Up to 40% of cancer patients on immune checkpoint inhibitors develop clinically significant immune-related adverse events (irAEs). The role of host immune status and function in predisposing patients to the development of irAEs remains unknown.

Methods

Sera from 65 patients receiving immune checkpoint inhibitors and 13 healthy controls were evaluated for 40 cytokines at pre-treatment, after 2-3 weeks and after 6 weeks and analysed for correlation with the development of irAEs.

Results

Of the 65 cancer patients enrolled, 55% were women; the mean age was 65 years and 98% received anti-PD1/PDL1 therapy. irAEs occurred in 35% of cases. Among healthy controls, cytokine levels were stable over time and lower than those in cancer patients at baseline. Significant increases in CXCL9, CXCL10, CXCL11 and CXCL13 occurred 2 weeks post treatment, and in CXCL9, CXCL10, CXCL11, CXCL13, IL-10 and CCL26 at 6 weeks post treatment. Patients who developed irAEs had lower levels of CXCL9, CXCL10, CXCL11 and CXCL19 at baseline and exhibited greater increases in CXCL9 and CXCL10 levels at post treatment compared to patients without irAEs.

Conclusions

Patients who developed irAEs have lower baseline levels and greater post-treatment increases in multiple cytokine levels, suggesting that underlying immune dysregulation may be associated with heightened risk for irAEs.

ConvPath: A software tool for lung adenocarcinoma digital pathological image analysis aided by a convolutional neural network

Wang S, Wang T, Yang L, Yang DM, Fujimoto J, Yi F, Luo X, Yang Y, Yao B, Lin S, Moran C, Kalhor N, Weissferdt A, Minna J, Xie Y, Wistuba II, Mao Y, Xiao G.
Dec 2019 EBioMedicine. 50:103-110. doi: 10.1016/j.ebiom.2019.10.033. Epub 2019 Nov 22. PMID: 31767541

Abstract

Background

The spatial distributions of different types of cells could reveal a cancer cell's growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key "hallmarks of cancer". However, the process by which pathologists manually recognize and localize all the cells in pathology slides is extremely labor intensive and error prone.

Methods

In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor cell, stromal cell, and lymphocyte classification, and extraction of tumor microenvironment-related features for lung cancer pathology images. To facilitate users in leveraging this pipeline for their research, all source scripts for ConvPath software are available at https://qbrc.swmed.edu/projects/cnn/.

Findings

The overall classification accuracy was 92.9% and 90.1% in training and independent testing datasets, respectively. By identifying cells and classifying cell types, this pipeline can convert a pathology image into a "spatial map" of tumor, stromal and lymphocyte cells. From this spatial map, we can extract features that characterize the tumor micro-environment. Based on these features, we developed an image feature-based prognostic model and validated the model in two independent cohorts. The predicted risk group serves as an independent prognostic factor, after adjusting for clinical variables that include age, gender, smoking status, and stage.

Interpretation

The analysis pipeline developed in this study could convert the pathology image into a "spatial map" of tumor cells, stromal cells and lymphocytes. This could greatly facilitate and empower comprehensive analysis of the spatial organization of cells, as well as their roles in tumor progression and metastasis.

Keywords

Cell distribution and interaction; Convolutional neural network; Deep learning; Lung adenocarcinoma; Pathology image; Prognosis.

Engineering Forward Genetics into Cultured Cancer Cells for Chemical Target Identification

Povedano JM, Liou J, Wei D, Srivatsav A, Kim J, Xie Y, Nijhawan D, McFadden DG.
Sep 2019 Cell Chem Biol. 26(9):1315-1321.e3. doi: 10.1016/j.chembiol.2019.06.006. Epub 2019 Jul 11. PMID: 31303577

Abstract

Target identification for biologically active small molecules remains a major barrier for drug discovery. Cancer cells exhibiting defective DNA mismatch repair (dMMR) have been used as a forward genetics system to uncover compound targets. However, this approach has been limited by the dearth of cancer cell lines that harbor naturally arising dMMR. Here, we establish a platform for forward genetic screening using CRISPR/Cas9 to engineer dMMR into mammalian cells. We demonstrate the utility of this approach to identify mechanisms of drug action in mouse and human cancer cell lines using in vitro selections against three cellular toxins. In each screen, compound-resistant alleles emerged in drug-resistant clones, supporting the notion that engineered dMMR enables forward genetic screening in mammalian cells.

A Bayesian Hidden Potts Mixture Model for Analyzing Lung Cancer Pathology Images

Li Q, Wang X, Liang F, Yi F, Xie Y, Gazdar A, Xiao G.
Oct 2019 Biostatistics. 20(4):565-581. doi: 10.1093/biostatistics/kxy019. PMID: 29788035

Abstract

Digital pathology imaging of tumor tissues, which captures histological details in high resolution, is fast becoming a routine clinical procedure. Recent developments in deep-learning methods have enabled the identification, characterization, and classification of individual cells from pathology images analysis at a large scale. This creates new opportunities to study the spatial patterns of and interactions among different types of cells. Reliable statistical approaches to modeling such spatial patterns and interactions can provide insight into tumor progression and shed light on the biological mechanisms of cancer. In this article, we consider the problem of modeling a pathology image with irregular locations of three different types of cells: lymphocyte, stromal, and tumor cells. We propose a novel Bayesian hierarchical model, which incorporates a hidden Potts model to project the irregularly distributed cells to a square lattice and a Markov random field prior model to identify regions in a heterogeneous pathology image. The model allows us to quantify the interactions between different types of cells, some of which are clinically meaningful. We use Markov chain Monte Carlo sampling techniques, combined with a double Metropolis-Hastings algorithm, in order to simulate samples approximately from a distribution with an intractable normalizing constant. The proposed model was applied to the pathology images of $205$ lung cancer patients from the National Lung Screening trial, and the results show that the interaction strength between tumor and stromal cells predicts patient prognosis (P = $0.005$). This statistical methodology provides a new perspective for understanding the role of cell-cell interactions in cancer progression.

Development and Validation of a Pathology Image Analysis-based Predictive Model for Lung Adenocarcinoma Prognosis - A Multi-cohort Study

Luo X, Yin S, Yang L, Fujimoto J, Yang Y, Moran C, Kalhor N, Weissferdt A, Xie Y, Gazdar A, Minna J, Wistuba II, Mao Y, Xiao G.
May 2019 Sci Rep. 9(1):6886. doi: 10.1038/s41598-019-42845-z. PMID: 31053738

Abstract

Prediction of disease prognosis is essential for improving cancer patient care. Previously, we have demonstrated the feasibility of using quantitative morphological features of tumor pathology images to predict the prognosis of lung cancer patients in a single cohort. In this study, we developed and validated a pathology image-based predictive model for the prognosis of lung adenocarcinoma (ADC) patients across multiple independent cohorts. Using quantitative pathology image analysis, we extracted morphological features from H&E stained sections of formalin fixed paraffin embedded (FFPE) tumor tissues. A prediction model for patient prognosis was developed using tumor tissue pathology images from a cohort of 91 stage I lung ADC patients from the Chinese Academy of Medical Sciences (CAMS), and validated in ADC patients from the National Lung Screening Trial (NLST), and the UT Special Program of Research Excellence (SPORE) cohort. The morphological features that are associated with patient survival in the training dataset from the CAMS cohort were used to develop a prognostic model, which was independently validated in both the NLST (n = 185) and the SPORE (n = 111) cohorts. The association between predicted risk and overall survival was significant for both the NLST (Hazard Ratio (HR) = 2.20, pv = 0.01) and the SPORE cohorts (HR = 2.15 and pv = 0.044), respectively, after adjusting for key clinical variables. Furthermore, the model also predicted the prognosis of patients with stage I ADC in both the NLST (n = 123, pv = 0.0089) and SPORE (n = 68, pv = 0.032) cohorts. The results indicate that the pathology image-based model predicts the prognosis of ADC patients across independent cohorts.

Validation of the 12-gene Predictive Signature for Adjuvant Chemotherapy Response in Lung Cancer

Xie Y, Lu W, Wang S, Tang X, Tang H, Zhou Y, Moran C, Behrens C, Roth JA, Zhou Q, Johnson DH, Swisher SG, Heymach JV, Papadimitrakopoulou VA, Xiao G, Minna JD, Wistuba II.
Jan 2019 Clin Cancer Res. 25(1):150-157. doi: 10.1158/1078-0432.CCR-17-2543. Epub 2018 Oct 4. PMID: 30287547

Abstract

Purpose

Response to adjuvant chemotherapy after tumor resection varies widely among patients with non-small cell lung cancer (NSCLC); therefore, it is of clinical importance to prospectively predict who will benefit from adjuvant chemotherapy before starting the treatment. The goal of this study is to validate a 12-gene adjuvant chemotherapy predictive signature developed from a previous study using a clinical-grade assay.

Experimental design

We developed a clinical-grade assay for formalin-fixed, paraffin-embedded (FFPE) samples using the NanoString nCounter platform to measure the mRNA expression of the previously published 12-gene set. The predictive performance was validated in a cohort of 207 patients with early-stage resected NSCLC with matched propensity score of adjuvant chemotherapy.

Results

The effects of adjuvant chemotherapy were significantly different in patients from the predicted adjuvant chemotherapy benefit group and those in the predicted adjuvant chemotherapy nonbenefit group (P = 0.0056 for interaction between predicted risk group and adjuvant chemotherapy). Specifically, in the predicted adjuvant chemotherapy benefit group, the patients receiving adjuvant chemotherapy had significant recurrence-free survival (RFS) benefit (HR = 0.34; P = 0.016; adjuvant chemotherapy vs. nonadjuvant chemotherapy), while in the predicted adjuvant chemotherapy nonbenefit group, the patients receiving adjuvant chemotherapy actually had worse RFS (HR = 1.86; P = 0.14; adjuvant chemotherapy vs. nonadjuvant chemotherapy) than those who did not receive adjuvant chemotherapy.

Conclusions

This study validated that the 12-gene signature and the FFPE-based clinical assay predict that patients whose resected lung adenocarcinomas exhibit an adjuvant chemotherapy benefit gene expression pattern and who then receive adjuvant chemotherapy have significant survival advantage compared with patients whose tumors exhibit the benefit pattern but do not receive adjuvant chemotherapy.

DIGREM: an integrated web-based platform for detecting effective multi-drug combinations

Zhang M, Lee S, Yao B, Xiao G, Xu L, Xie Y.
May 2019 Bioinformatics. 35(10):1792-1794. doi: 10.1093/bioinformatics/bty860. PMID: 30295728

Abstract

Motivation

Synergistic drug combinations are a promising approach to achieve a desirable therapeutic effect in complex diseases through the multi-target mechanism. However, in vivo screening of all possible multi-drug combinations remains cost-prohibitive. An effective and robust computational model to predict drug synergy in silico will greatly facilitate this process.

Results

We developed DIGREM (Drug-Induced Genomic Response models for identification of Effective Multi-drug combinations), an online tool kit that can effectively predict drug synergy. DIGREM integrates DIGRE, IUPUI_CCBB, gene set-based and correlation-based models for users to predict synergistic drug combinations with dose-response information and drug-treated gene expression profiles.

Availability and implementation
http://lce.biohpc.swmed.edu/drugcombination
Supplementary information

Supplementary data are available at Bioinformatics online.

GeNeCK: a web server for gene network construction and visualization

Zhang M, Li Q, Yu D, Yao B, Guo W, Xie Y, Xiao G.
Jan 2019 BMC Bioinformatics. 20(1):12. doi: 10.1186/s12859-018-2560-0. PMID: 30616521

Abstract

Background

Reverse engineering approaches to infer gene regulatory networks using computational methods are of great importance to annotate gene functionality and identify hub genes. Although various statistical algorithms have been proposed, development of computational tools to integrate results from different methods and user-friendly online tools is still lagging.

Results

We developed a web server that efficiently constructs gene networks from expression data. It allows the user to use ten different network construction methods (such as partial correlation-, likelihood-, Bayesian- and mutual information-based methods) and integrates the resulting networks from multiple methods. Hub gene information, if available, can be incorporated to enhance performance.

Conclusions
GeNeCK is an efficient and easy-to-use web application for gene regulatory network construction. It can be accessed at http://lce.biohpc.swmed.edu/geneck
Keywords

Bayesian; Correlation; Ensemble; Gene network; Hub gene; Likelihood; Mutual information; Statistical method; Visualization; Web server.

Type and case volume of health care facility influences survival and surgery selection in cases with early-stage non-small cell lung cancer.

Wang S, Lai S, von Itzstein MS, Yang L, Yang DM, Zhan X, Xiao G, Halm EA, Gerber DE, Xie Y
Dec 2019Cancer. 125(23):4252-4259. doi: 10.1002/cncr.32377. Epub 2019 Sep 10. PMID: 31503336

Abstract

Background

With the expansion of non-small cell lung cancer (NSCLC) screening methods, the percentage of cases with early-stage NSCLC is anticipated to increase. Yet it remains unclear how the type and case volume of the health care facility at which treatment occurs may affect surgery selection and overall survival for cases with early-stage NSCLC.

Method:

A total of 332,175 cases with the American Joint Committee on Cancer (AJCC) TNM stage I and stage II NSCLC who were reported to the National Cancer Data Base (NCDB) by 1302 facilities were studied. Facility type was characterized in the NCDB as community cancer program (CCP), comprehensive community cancer program (CCCP), academic/research program (ARP), or integrated network cancer program (INCP). Each facility type was dichotomized further into high-volume or low-volume groups based on the case volume. Multivariate Cox proportional hazard models, the logistic regression model, and propensity score matching were used to evaluate differences in survival and surgery selection among facilities according to type and volume.

Result:

Cases from ARPs were found to have the longest survival (median, 16.4 months) and highest surgery rate (74.8%), whereas those from CCPs had the shortest survival (median, 9.7 months) and the lowest surgery rate (60.8%). The difference persisted when adjusted by potential confounders. For cases treated at CCPs, CCCPs, and ARPs, high-volume facilities had better survival outcomes than low-volume facilities. In facilities with better survival outcomes, surgery was performed for a greater percentage of cases compared with facilities with worse outcomes.

Conclusions:

For cases with early-stage NSCLC, both facility type and case volume influence surgery selection and clinical outcome. Higher surgery rates are observed in facilities with better survival outcomes..

Development and Validation of a Nomogram Prognostic Model for Patients With Advanced Non-Small-Cell Lung Cancer.

Wang T, Lu R, Lai S, Schiller JH, Zhou FL, Ci B, Wang S, Gao X, Yao B, Gerber DE, Johnson DH, Xiao G, Xie Y
Apr 2019Cancer Inform.5;18:1176935119837547. PMID: 31057324

Abstract

Importance

Nomogram prognostic models can facilitate cancer patient treatment plans and patient enrollment in clinical trials.

Objective

The primary objective is to provide an updated and accurate prognostic model for predicting the survival of advanced non-small-cell lung cancer (NSCLC) patients, and the secondary objective is to validate a published nomogram prognostic model for NSCLC using an independent patient cohort.

Designs

1817 patients with advanced NSCLC from the control arms of 4 Phase III randomized clinical trials were included in this study. Data from 524 NSCLC patients from one of these trials were used to validate a previously published nomogram and then used to develop an updated nomogram. Patients from the other 3 trials were used as independent validation cohorts of the new nomogram. The prognostic performances were comprehensively evaluated using hazard ratios, integrated area under the curve (AUC), concordance index, and calibration plots.

Settings

General community.

Main Outcome

A nomogram model was developed to predict overall survival in NSCLC patients.

Results

We demonstrated the prognostic power of the previously published model in an independent cohort. The updated prognostic model contains the following variables: sex, histology, performance status, liver metastasis, hemoglobin level, white blood cell counts, peritoneal metastasis, skin metastasis, and lymphocyte percentage. This model was validated using various evaluation criteria on the 3 independent cohorts with heterogeneous NSCLC populations. In the SUN1087 patient cohort, the continuous risk score output by the nomogram achieved an integrated area under the receiver operating characteristics (ROC) curve of 0.83, a log-rank P-value of 3.87e−11, and a concordance index of 0.717. In the SAVEONCO patient cohort, the integrated area under the ROC curve was 0.755, the log-rank P-value was 4.94e−6 and the concordance index was 0.678. In the VITAL patient cohort, the integrated area under the ROC curve was 0.723, the log-rank P-value was 1.36e−11, and the concordance index was 0.654. We implemented the proposed nomogram and several previously published prognostic models on an online Web server for easy user access.

Conclusions

This nomogram model based on basic clinical features and routine lab testing predicts individual survival probabilities for advanced NSCLC and exhibits cross-study robustness.

DEFOR: Depth- and Frequency-Based Somatic Copy Number Alteration Detector.

Zhang H, Zhan X, Brugarolas J, Xie Y
Mar 2019 Bioinformatics.pii: btz170. doi: 10.1093/bioinformatics/btz170.PMID: 30860569

Abstract

Motivation

Detection of somatic copy number alterations (SCNAs) using high-throughput sequencing has become popular because of rapid developments in sequencing technology. Existing methods do not perform well in calling SCNAs for the unstable tumor genomes.

Results

We developed a new method, DEFOR, to detect SCNAs in tumor samples from exome-sequencing data. The evaluation showed that DEFOR has a higher accuracy for SCNA detection from exome sequencing compared with the five existing tools. This advantage is especially apparent in unstable tumor genomes with a large proportion of SCNAs.

Identifying genes with tri-modal association with survival and tumor grade in cancer patients.

Zhang M, Wang T, Sirianni R, Shaul PW, Xie Y
Jan 2019 BMC Bioinformatics.20(1):13. doi: 10.1186/s12859-018-2582-7. PMID: 30621577

Abstract

Background

Previous cancer genomics studies focused on searching for novel oncogenes and tumor suppressor genes whose abundance is positively or negatively correlated with end-point observation, such as survival or tumor grade. This approach may potentially miss some truly functional genes if both its low and high modes have associations with end-point observation. Such genes act as both oncogenes and tumor suppressor genes, a scenario that is unlikely but theoretically possible.

Results

We invented an Expectation-Maximization (EM) algorithm to divide patients into low-, middle- and high-expressing groups according to the expression level of a certain gene in both tumor and normal patients. We found one gene, ORMDL3, whose low and high modes were both associated with worse survival and higher tumor grade in breast cancer patients in multiple patient cohorts. We speculate that its tumor suppressor gene role may be real, while its high expression correlating with worse end-point outcome is probably due to the passenger event of the nearby ERBB2's amplification.

Conclusions

The proposed EM algorithm can effectively detect genes having tri-modal distributed expression in patient groups compared to normal genes, thus rendering a new perspective on dissecting the association between genomic features and end-point observations. Our analysis of breast cancer datasets suggest that the gene ORMDL3 may have an unexploited tumor suppressive function.

LCE: an open web portal to explore gene expression and clinical associations in lung cancer.

Cai L, Lin S, Girard L, Zhou Y, Yang L, Ci B, Zhou Q, Luo D, Yao B, Tang H, Allen J, Huffman K, Gazdar A, Heymach J, Wistuba I, Xiao G, Minna J, Xie Y
Apr 2019 Oncogene.38(14):2551-2564. doi: 10.1038/s41388-018-0588-2. Epub 2018 Dec 7. PMID: 30532070

Abstract

We constructed a lung cancer-specific database housing expression data and clinical data from over 6700 patients in 56 studies. Expression data from 23 genome-wide platforms were carefully processed and quality controlled, whereas clinical data were standardized and rigorously curated. Empowered by this lung cancer database, we created an open access web resource—the Lung Cancer Explorer (LCE), which enables researchers and clinicians to explore these data and perform analyses. Users can perform meta-analyses on LCE to gain a quick overview of the results on tumor vs non-malignant tissue (normal) differential gene expression and expression-survival association. Individual dataset-based survival analysis, comparative analysis, and correlation analysis are also provided with flexible options to allow for customized analyses from the user.

RCRnorm: An integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Jia G, Wang X, Li Q, Lu W, Tang X, Wistuba I, Xie Y
2019 Annals of Applied Statistics.[In press]

Abstract

Formalin-fixed, paraffin-embedded (FFPE) samples have great potential for cancer biomarker discovery and retrospective studies of other diseases. However, its application is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs extracted from these samples. NanoString nCounter platform is a medium-throughput technology that measures gene expression with high sensitivity. This platform is compatible with FFPE samples, which may turn the large collections of FFPE samples into valuable resources for academic research and clinical applications. However, statistical methods for normalizing NanoString nCounter data generated with FFPE samples are far behind those for traditional technologies such as microarray. In this paper, we construct an integrated system of random-coefficient hierarchical regression models called RCRnorm to capture main patterns and characteristics observed from real NanoString nCounter data for FFPE samples, and develop a Bayesian approach to estimate model parameters and further normalize gene expression across different samples. Performance of RCRnorm is validated on simulated datasets and real data.

Main bronchus location is a predictor for metastasis and prognosis in lung adenocarcinoma: A large cohort analysis.

Yang, L, Wang S, Gerber DE, Zhou Y, Xu F, Liu J, Liang H, Xiao G, Zhou Q, Gazdar A, Xie Y
Jun 2018 Lung Cancer. Volume 120 , 22 – 26

Abstract

Objectives

In the literature, inconsistent associations between the primary locations of lung adenocarcinomas (ADCs) with patient prognosis have been reported, due to varying definitions for central and peripheral locations. In this study, we investigated the clinical characteristics and prognoses of ADCs located in the main bronchus.

Methods

A total of 397,189 lung ADCs registered from 2004 to 2013 in the National Cancer Database (NCDB) were extracted and divided into main bronchus-located ADCs (2.5%, N = 10,111) and non-main bronchus ADCs (97.5%, N = 387,078). The ADCs located in the main bronchus and those not in the main bronchus were compared in terms of patient prognosis, lymph node involvement, distant metastases and other clinical features, including rate of curative-intent resection, histologic grade, and stage.

Results

ADCs located in the main bronchus had significantly worse patient survival than those in the non-main bronchus, both for all patients (HR = 1.82, 95% CI 1.78-1.86) and for those undergoing curative-intent resection (HR = 2.49, 95% CI 2.23-2.78). Furthermore, ADCs located in the main bronchus had a significantly higher rate of lymph node involvement and distant metastasis than those not in the main bronchus, when stratified by tumor size (trend test, p < e-16). Multivariate analysis of overall survival showed that main bronchus location is a prognostic factor (HR = 1.15, 95% CI 1.08-1.23) independent of other clinical factors.

Conclusions

Main bronchus location is an independent predictor for metastasis and worse outcomes irrespective of stage and treatment. Tumor primary location might be considered in prognostication and treatment planning.

Outcomes of neoadjuvant and adjuvant chemotherapy in stage 2 and 3 non-small cell lung cancer: an analysis of the National Cancer Database

MacLean M, Luo X, Wang S, Kernstine K, Gerber DE, Xie Y
May 2018Oncotarget.9(36):24470-24479.

Abstract

Introduction

The current recommendation for the treatment of stage II and III NSCLC is surgery with chemotherapy. While the convention is to administer chemotherapy postoperatively (adjuvant chemotherapy), inconsistent results have been reported regarding the administration of chemotherapy preoperatively (neoadjuvant chemotherapy). Therefore, a comprehensive analysis of neoadjuvant chemotherapy use in NSCLC is needed.

Methods

The National Cancer Database (NCDB) was queried for all cases of stage II and III NSCLC from 2006 to 2012. These patients were stratified by stage, and the factors affecting use of neoadjuvant chemotherapy and the effects of neoadjuvant versus adjuvant chemotherapy on overall survival (OS) were investigated.

Results

Of the 35,134 NSCLC patients identified, 18,684 received surgery alone, 1,154 received surgery with neoadjuvant chemotherapy, and 15,296 received surgery with adjuvant chemotherapy. Race, Charlson-Deyo score, facility type, insurance type and stage of disease are associated with the use of neoadjuvant chemotherapy. In the case of stage II disease, adjuvant chemotherapy showed improved survival (median OS = 80.8 months) over neoadjuvant chemotherapy (OS = 67.0 months) and surgery alone (OS = 51.0 months). For stage III disease, adjuvant chemotherapy (OS = 49.0 months) showed improved survival over surgery alone (OS = 24.3 months), followed by neoadjuvant chemotherapy (OS = 42.0 months). After propensity score matching, adjuvant chemotherapy was found to provide a survival advantage over neoadjuvant in both stage II (HR = 0.70; p = 5.8e-3) and stage III (HR = 0.77; p = 0.011) NSCLC.

Conclusions

Our analysis finds a survival advantage for neoadjuvant chemotherapy when compared to surgery alone, but no advantage compared to adjuvant chemotherapy in the treatment of resectable stage II and III NSCLC.

Development and Validation of a Nomogram Prognostic Model for Small-Cell Lung Cancer Patients.

Wang S, Yang L, Ci B, Maclean M, Gerber D, Xiao G, Xie Y
Jun 2018Journal of Thoracic Oncology.PMID: 29902534.

Abstract

Introduction

SCLC accounts for almost 15% of lung cancer cases in the United States. Nomogram prognostic models could greatly facilitate risk stratification and treatment planning, as well as more refined enrollment criteria for clinical trials. We developed and validated a new nomogram prognostic model for SCLC patients using a large SCLC patient cohort from the National Cancer Database (NCDB).

Methods

Clinical data for 24,680 SCLC patients diagnosed from 2004 to 2011 were used to develop the nomogram prognostic model. The model was then validated using an independent cohort of 9700 SCLC patients diagnosed from 2012 to 2013. The prognostic performance was evaluated using p value, concordance index and integrated area under the (time-dependent receiver operating characteristic) curve (AUC).

Results

The following variables were contained in the final prognostic model: age, sex, race, ethnicity, Charlson/Deyo score, TNM stage (assigned according to the American Joint Committee on Cancer [AJCC] eighth edition), treatment type (combination of surgery, radiation therapy, and chemotherapy), and laterality. The model was validated in an independent testing group with a concordance index of 0.722 ± 0.004 and an integrated area under the curve of 0.79. The nomogram model has a significantly higher prognostic accuracy than previously developed models, including the AJCC eighth edition TNM-staging system. We implemented the proposed nomogram and four previously published nomograms in an online webserver.

Conclusions

We developed a nomogram prognostic model for SCLC patients, and validated the model using an independent patient cohort. The nomogram performs better than earlier models, including models using AJCC staging.

Usefulness of a Simple Algorithm to Identify Hypertensive Patients Who Benefit from Intensive Blood Pressure Lowering.

Wang S, Khera R, Das SR, Vigen R, Wang T, Luo X, Lu R, Zhan X, Xiao G, Vongpatanasin W, Xie Y
Jul 2018Am J Cardiol.122(2):248-254. doi: 10.1016/j.amjcard.2018.03.361. Epub 2018 Apr 11. PMID: 29880288

Abstract

Large randomized trials have provided inconsistent evidence regarding the benefit of intensive blood pressure (BP) lowering in hypertensive patients. Identifying which patients derive a higher net benefit is essential in informing clinical decision-making. We used patient-level data from 2 trials that tested intensive versus standard BP lowering, Systolic Blood Pressure Intervention Trial (SPRINT) and Action to Control Cardiovascular Risk in Diabetes (ACCORD), to assess whether stratification by cardiovascular disease (CVD) risk will identify patients with a more favorable risk-benefit profile for intensive BP lowering. Within SPRINT, we selected a subset of patients at the extremes of major adverse cardiovascular event rates to develop a decision tree using recursive partitioning modeling. We then validated its predictive effects in the remaining 'intermediate' SPRINT subset (n = 8,357) and externally in ACCORD (n = 2,258). Recursive partitioning produced a 3-variable decision tree model consisting of age ≥74 years, urinary albumin-creatinine ratio ≥34, and history of clinical CVD. It classified 48.6% of SPRINT and 55.3% of ACCORD patients as "high-risk." Compared with standard treatment, intensive BP lowering was associated with lower rates of major adverse cardiovascular event in this high-risk population in both SPRINT cross-validation data (hazard ratio [HR] 0.66, 95% confidence interval [CI] 0.52 to 0.85) and ACCORD (HR 0.67, 95% CI 0.50 to 0.90), but not in the remaining low-risk patients (SPRINT: HR 0.83, 95% CI 0.56 to 1.25; ACCORD: HR 1.09, 95% CI 0.64 to 1.83). Additionally, intensive BP lowering did not confer an excess risk of serious adverse events in the high-risk group. In conclusion, this simple risk prediction model consisting of age, urinary albumin-creatinine ratio, and clinical CVD history successfully identified a subset of hypertensive patients who deriv

Validation of the 12-gene predictive signature for Adjuvant Chemotherapy Response in Lung Cancer.

Xie Y, Lu W, Wang S, Tang X, Tang H, Zhou Y, Moran C, Behrens C, Roth JA, Zhou Q, Johnson DH, Swisher SG, Heymach JV, Papadimitrakopoulou VA, Xiao G, Minna JD, Wistuba II.
Oct 2018Clin Cancer Res.pii: clincanres.2543.2017. doi: 10.1158/1078-0432.CCR-17-2543.PMID: 30287547

Abstract

Introduction

The current recommendation for the treatment of stage II and III NSCLC is surgery with chemotherapy. While the convention is to administer chemotherapy postoperatively (adjuvant chemotherapy), inconsistent results have been reported regarding the administration of chemotherapy preoperatively (neoadjuvant chemotherapy). Therefore, a comprehensive analysis of neoadjuvant chemotherapy use in NSCLC is needed.

Methods

The National Cancer Database (NCDB) was queried for all cases of stage II and III NSCLC from 2006 to 2012. These patients were stratified by stage, and the factors affecting use of neoadjuvant chemotherapy and the effects of neoadjuvant versus adjuvant chemotherapy on overall survival (OS) were investigated.

Results

Of the 35,134 NSCLC patients identified, 18,684 received surgery alone, 1,154 received surgery with neoadjuvant chemotherapy, and 15,296 received surgery with adjuvant chemotherapy. Race, Charlson-Deyo score, facility type, insurance type and stage of disease are associated with the use of neoadjuvant chemotherapy. In the case of stage II disease, adjuvant chemotherapy showed improved survival (median OS = 80.8 months) over neoadjuvant chemotherapy (OS = 67.0 months) and surgery alone (OS = 51.0 months). For stage III disease, adjuvant chemotherapy (OS = 49.0 months) showed improved survival over surgery alone (OS = 24.3 months), followed by neoadjuvant chemotherapy (OS = 42.0 months). After propensity score matching, adjuvant chemotherapy was found to provide a survival advantage over neoadjuvant in both stage II (HR = 0.70; p = 5.8e-3) and stage III (HR = 0.77; p = 0.011) NSCLC.

Conclusions

Our analysis finds a survival advantage for neoadjuvant chemotherapy when compared to surgery alone, but no advantage compared to adjuvant chemotherapy in the treatment of resectable stage II and III NSCLC.

DIGREM: an integrated web-based platform for detecting effective multi-drug combinations.

Zhang M, Lee S, Yao B, Xiao G, Xu L, Xie Y
Oct 2018Bioinformatics .PMID: 30295728

Abstract

Motivation

Synergistic drug combinations are a promising approach to achieve a desirable therapeutic effect in complex diseases through the multi-target mechanism. However, in vivo screening of all possible multi-drug combinations remains cost-prohibitive. An effective and robust computational model to predict drug synergy in silico will greatly facilitate this process.

Results

We developed DIGREM (Drug-Induced Genomic Response models for identification of Effective Multi-drug combinations), an online tool kit that can effectively predict drug synergy. DIGREM integrates DIGRE, IUPUI_CCBB, gene set-based and correlation-based models for users to predict synergistic drug combinations with dose response information and drug-treated gene expression profiles.

Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis

Luo, X., Zang, X., Yang, L., Huang, J., Liang, F., Rodriguez, C.J., Wistuba, II, Gazdar, A., Xie, Y, Xiao, G.
March 2017 Journal of Thoracic Oncology, 12:3, 501-509
image

Abstract

Introduction

Pathological examination of histopathological slides is a routine clinical procedure for lung cancer diagnosis and prognosis. Although the classification of lung cancer has been updated to become more specific, only a small subset of the total morphological features are taken into consideration. The vast majority of the detailed morphological features of tumor tissues, particularly tumor cells’ surrounding microenvironment, are not fully analyzed. The heterogeneity of tumor cells and close interactions between tumor cells and their microenvironments are closely related to tumor development and progression. The goal of this study is to develop morphological feature–based prediction models for the prognosis of patients with lung cancer.

Method

We developed objective and quantitative computational approaches to analyze the morphological features of pathological images for patients with NSCLC. Tissue pathological images were analyzed for 523 patients with adenocarcinoma (ADC) and 511 patients with squamous cell carcinoma (SCC) from The Cancer Genome Atlas lung cancer cohorts. The features extracted from the pathological images were used to develop statistical models that predict patients’ survival outcomes in ADC and SCC, respectively.

Results

We extracted 943 morphological features from pathological images of hematoxylin and eosin–stained tissue and identified morphological features that are significantly associated with prognosis in ADC and SCC, respectively. Statistical models based on these extracted features stratified NSCLC patients into high-risk and low-risk groups. The models were developed from training sets and validated in independent testing sets: a predicted high-risk group versus a predicted low-risk group (for patients with ADC: hazard ratio = 2.34, 95% confidence interval: 1.12–4.91, p = 0.024; for patients with SCC: hazard ratio = 2.22, 95% confidence interval: 1.15–4.27, p = 0.017) after adjustment for age, sex, smoking status, and pathologic tumor stage.

Conclusions

The results suggest that the quantitative morphological features of tumor pathological images predict prognosis in patients with lung cancer.

Finding RNA-Protein Interaction Sites Using HMMs

Wang, T., Yun, J.,Xie, Y, Xiao, G.
February 2017 Methods Mol Biol 1552:177-184. PMID: 28224499
image

Abstract

RNA-binding proteins play important roles in the various stages of RNA maturation through binding to its target RNAs. Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq) has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Several Hidden Markov model-based (HMM) approaches have been suggested to identify protein–RNA binding sites from CLIP-Seq datasets. In this chapter, we describe how HMM can be applied to analyze CLIP-Seq datasets, including the bioinformatics preprocessing steps to extract count information from the sequencing data before HMM and the downstream analysis steps following peak-calling.

Automatic extraction of cell nuclei from H&E-stained histopathological images

Yi, F., Huang, J., Yang, L.,Xie, Y, Xiao, G.
Apr 2017 J Med Imaging 4(2):027502. PMID: 28653017

Abstract

Extraction of cell nuclei from hematoxylin and eosin (H&E)-stained histopathological images is an essential preprocessing step in computerized image analysis for disease detection, diagnosis, and prognosis. We present an automated cell nuclei segmentation approach that works with H&E-stained images. A color deconvolution algorithm was first applied to the image to get the hematoxylin channel. Using a morphological operation and thresholding technique on the hematoxylin channel image, candidate target nuclei and background regions were detected, which were then used as markers for a marker-controlled watershed transform segmentation algorithm. Moreover, postprocessing was conducted to split the touching nuclei. For each segmented region from the previous steps, the regional maximum value positions were identified as potential nuclei centers. These maximum values were further grouped into [Formula: see text]-clusters, and the locations within each cluster were connected with the minimum spanning tree technique. Then, these connected positions were utilized as new markers for a watershed segmentation approach. The final number of nuclei at each region was determined by minimizing an objective function that iterated all of the possible [Formula: see text]-values. The proposed method was applied to the pathological images of the tumor tissues from The Cancer Genome Atlas study. Experimental results show that the proposed method can lead to promising results in terms of segmentation accuracy and separation of touching nuclei.

An Argonaute phosphorylation cycle promotes microRNA-mediated silencing

Golden, R.J., Chen, B., Li, T., Braun, J., Manjunath, H., Chen, X., Wu, J., Schmid, V., Chang, T.C., Kopp, F., Ramirez-Martinez, A., Tagliabracci, VS., Chen, Z.J., Xie, Y, Mendell, J.T.
February 2017 Nature 542, 197–202

Abstract

MicroRNAs (miRNAs) perform critical functions in normal physiology and disease by associating with Argonaute proteins and downregulating partially complementary messenger RNAs (mRNAs). Here we use clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein 9 (Cas9) genome-wide loss-of-function screening coupled with a fluorescent reporter of miRNA activity in human cells to identify new regulators of the miRNA pathway. By using iterative rounds of screening, we reveal a novel mechanism whereby target engagement by Argonaute 2 (AGO2) triggers its hierarchical, multi-site phosphorylation by CSNK1A1 on a set of highly conserved residues (S824–S834), followed by rapid dephosphorylation by the ANKRD52–PPP6C phosphatase complex. Although genetic and biochemical studies demonstrate that AGO2 phosphorylation on these residues inhibits target mRNA binding, inactivation of this phosphorylation cycle globally impairs miRNA-mediated silencing. Analysis of the transcriptome-wide binding profile of non-phosphorylatable AGO2 reveals a pronounced expansion of the target repertoire bound at steady-state, effectively reducing the active pool of AGO2 on a per-target basis. These findings support a model in which an AGO2 phosphorylation cycle stimulated by target engagement regulates miRNA:target interactions to maintain the global efficiency of miRNA-mediated silencing.

Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data

Guinney, J., Wang, T., Laajala, T.D., Winner, K.K., Bare, J.C., Neto, E.C., et al, Xie, Y, Aittokallio, T., Zhou, F.L., Costello, J.C.
January 2017 The Lancet Oncology 18(1):132-42.
image

Abstract

Background

Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease.

Methods

Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone.

Findings

50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0·791; Bayes factor >5) and surpassed the reference model (iAUC 0·743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3·32, 95% CI 2·39-4·62, p<0·0001; reference model: 2·56, 1·85-3·53, p<0·0001). The new model was validated further on the ENTHUSE M1 cohort with similarly high performance (iAUC 0·768). Meta-analysis across all methods confirmed previously identified predictive clinical variables and revealed aspartate aminotransferase as an important, albeit previously under-reported, prognostic biomarker.

Interpretation

Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.

Finding

Sanofi US Services, Project Data Sphere.

Comprehensive evaluation of published gene expression prognostic signatures for biomarker-based lung cancer clinical studies

Tang, H., Wang, S., Xiao, G., Schiller, J., Papadimitrakopoulou, V., Minna, J., Wistuba, I.I., Xie, Y,
April 2017 Annals of Oncology, Volume 28, Issue 4

Abstract

Background

A more accurate prognosis for non-small-cell lung cancer (NSCLC) patients could aid in the identification of patients at high risk for recurrence. Many NSCLC mRNA expression signatures claiming to be prognostic have been reported in the literature. The goal of this study was to identify the most promising mRNA prognostic signatures in NSCLC for further prospective clinical validation.

Experimental design

We carried out a systematic review and meta-analysis of published mRNA prognostic signatures for resected NSCLC. The prognostic performance of each signature was evaluated via a meta-analysis of 1927 early stage NSCLC patients collected from 15 studies using three evaluation metrics (hazard ratios, concordance scores, and time-dependent receiver-operating characteristic curves). The performance of each signature was then evaluated against 100 random signatures. The prognostic power independent of clinical risk factors was assessed by multivariate Cox models.

Results

Through a literature search, we identified 42 lung cancer prognostic signatures derived from genome-wide expression profiling analysis. Based on meta-analysis, 25 signatures were prognostic for survival after adjusting for clinical risk factors and 18 signatures carried out significantly better than random signatures. When analyzing histology types separately, 17 signatures and 8 signatures are prognostic for adenocarcinoma and squamous cell lung cancer, respectively. Despite little overlap among published gene signatures, the top-performing signatures are highly concordant in predicted patient outcomes.

Conclusions

Based on this large-scale meta-analysis, we identified a set of mRNA expression prognostic signatures appropriate for further validation in prospective clinical studies.

Evaluation of the 7th and 8th editions of the AJCC/UICC TNM staging systems for lung cancer in a large North American cohort.

Yang L, Wang S, Zhou Y, Lai S, Xiao G, Gazdar A,Xie Y
May 24 2017 Oncotarget.8(40):66784-66795. PMID: 28977996

Abstract

Purpose

The new 8th American Joint Committee on Cancer (AJCC)/International Union for Cancer Control (UICC) lung cancer staging system was developed and internally validated using the International Association for the Study of Lung Cancer (IASLC) database, but external validation is needed. The goal of this study is to validate the discriminatory ability and prognostic performance of this new staging system in a larger, independent non-small cell lung cancer (NSCLC) cohort with greater emphasis on North American patients

Methods

A total of 858,909 NSCLC cases with one malignant primary tumor collected from 2004 to 2013 in the National Cancer Database (NCDB) were analyzed. The primary coding guidelines of the Collaborative Staging Manual and Coding Instructions for the new 8th edition AJCC/UICC lung cancer staging system was used to define the new T, M and TNM stages for all patients in the database. Kaplan-Meier curves, Cox regression models and time-dependent receiver operating characteristics were used to compare the discriminatory ability and prognostic performance of the 7th and the revised 8th T, M categories and overall stages.

Results

We demonstrated that the 8th staging system provides better discriminatory ability than the 7th staging system and predicts prognosis for NSCLC patients using the NCDB. There were significant survival differences between adjacent groups defined by both clinical staging and pathologic staging systems. These staging parameters were significantly associated with survival after adjusting for other factors.

Conclusions

The updated T, M, and overall TNM stage of the 8th staging system show improvement compared to the 7th edition in discriminatory ability between adjacent subgroups and are independent predictors for prognosis.

Targeting renal cell carcinoma with a HIF-2 antagonist.

Chen W, Hill H, Christie A, Kim MS, Holloman E, Pavia-Jimenez A, Homayoun F, Ma Y, Patel N, Yell P, Hao G, Yousuf Q, Joyce A, Pedrosa I, Geiger H, Zhang H, Chang J, Gardner KH, Bruick RK, Reeves C, Hwang TH, Courtney K, Frenkel E, Sun X, Zojwalla N, Wong T, Rizzi JP, Wallace EM, Josey JA, Xie, Y, Xie XJ, Kapur P, McKay RM, Brugarolas J.
November 2016 Nature, 1539, 112–117

Abstract

Clear cell renal cell carcinoma (ccRCC) is characterized by inactivation of the von Hippel-Lindau tumour suppressor gene (VHL)1,2. Because no other gene is mutated as frequently in ccRCC and VHL mutations are truncal3, VHL inactivation is regarded as the governing event4. VHL loss activates the HIF-2 transcription factor, and constitutive HIF-2 activity restores tumorigenesis in VHL-reconstituted ccRCC cells5. HIF-2 has been implicated in angiogenesis and multiple other processes6,7,8,9, but angiogenesis is the main target of drugs such as the tyrosine kinase inhibitor sunitinib10. HIF-2 has been regarded as undruggable11. Here we use a tumourgraft/patient-derived xenograft platform12,13 to evaluate PT2399, a selective HIF-2 antagonist that was identified using a structure-based design approach. PT2399 dissociated HIF-2 (an obligatory heterodimer of HIF-2α–HIF-1β)14 in human ccRCC cells and suppressed tumorigenesis in 56% (10 out of 18) of such lines. PT2399 had greater activity than sunitinib, was active in sunitinib-progressing tumours, and was better tolerated. Unexpectedly, some VHL-mutant ccRCCs were resistant to PT2399. Resistance occurred despite HIF-2 dissociation in tumours and evidence of Hif-2 inhibition in the mouse, as determined by suppression of circulating erythropoietin, a HIF-2 target15 and possible pharmacodynamic marker. We identified a HIF-2-dependent gene signature in sensitive tumours. Gene expression was largely unaffected by PT2399 in resistant tumours, illustrating the specificity of the drug. Sensitive tumours exhibited a distinguishing gene expression signature and generally higher levels of HIF-2α. Prolonged PT2399 treatment led to resistance. We identified binding site and second site suppressor mutations in HIF-2α and HIF-1β, respectively. Both mutations preserved HIF-2 dimers despite treatment with PT2399. Finally, an extensively pretreated patient whose tumour had given rise to a sensitive tumourgraft showed disease control for more than 11 months when treated with a close analogue of PT2399, PT2385. We validate HIF-2 as a target in ccRCC, show that some ccRCCs are HIF-2 independent, and set the stage for biomarker-driven clinical trials.

FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies.

Kim J, Kim MS, Koh AY, Xie, Y, Zhan X.
October 2016 BMC Bioinformatics, 10;17(1):420.

Abstract

Background

Given the lack of a complete and comprehensive library of microbial reference genomes, determining the functional profile of diverse microbial communities is challenging. The available functional analysis pipelines lack several key features: (i) an integrated alignment tool, (ii) operon-level analysis, and (iii) the ability to process large datasets.

Method

We developed objective and quantitative computational approaches to analyze the morphological features of pathological images for patients with NSCLC. Tissue pathological images were analyzed for 523 patients with adenocarcinoma (ADC) and 511 patients with squamous cell carcinoma (SCC) from The Cancer Genome Atlas lung cancer cohorts. The features extracted from the pathological images were used to develop statistical models that predict patients’ survival outcomes in ADC and SCC, respectively.

Results

Here we introduce our open-sourced, stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data, FMAP (Functional Mapping and Analysis Pipeline). FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines.

Conclusions

FMAP is a comprehensive tool for providing functional analysis of metagenomic/metatranscriptomic sequencing data. With the added features of integrated alignment, operon-level analysis, and the ability to process large datasets, FMAP will be a valuable addition to the currently available functional analysis toolbox. We believe that this software will be of great value to the wider biology and bioinformatics communities.

Increase in Cancer Center Staff Effort Related to Electronic Patient Portal Use.

Laccetti AL, Chen B, Cai J, Gates S, , Xie, Y, Lee SJ, Gerber DE.
December 2017 Journal of Oncology Practice 12(12):e981-e990.

Abstract

PURPOSE

Electronic portals provide patients with real-time access to personal health records. Use of this technology by individuals with cancer is particularly intensive. We therefore examined patterns of use of electronic portals by clinic staff at a National Cancer Institute-designated comprehensive cancer center.

Method

We identified and characterized cancer center providers and clinic staff who performed electronic activities related to MyChart, the institution's personal health records portal, from 2009 to 2014. Total MyChart actions and messages received were quantified and characterized according to type, timing, and staff category.

Results

Two hundred eighty-nine employees were included in our analysis: 85 nurses (29%), 79 ancillary staff (27%), 49 clerical/managerial staff (17%), 47 physicians (16%), and 29 advanced practice providers (10%). These individuals performed 740,613 MyChart actions and received 117,799 messages. Seventy-seven percent of actions were performed by nurses, 11% by ancillary staff, 6% by advanced practice providers, 5% by physicians, and 1% by clerical/managerial staff. From 2011 to 2014, staff MyChart activity increased approximately 10-fold. On average, 6.3 staff MyChart actions were performed per patient-initiated message. In 2014, nurses performed an average of 3,838 MyChart actions and received an average of 589 messages, compared with 591 actions and 87 messages in 2011 ( P < .001). Sixteen percent of all actions occurred outside clinic hours.

Conclusions

Cancer center employee effort related to an electronic patient portal has increased markedly over time, particularly among nursing staff. Because further uptake of this technology is expected, it is critical to consider potential effects on clinical resources, employee and patient satisfaction, and patient safety.

Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis.

Sieberts SK, Zhu F, García-García J, Stahl E, Pratap A, Pandey G, Pappas D, Aguilar D, Anton B, Bonet J, Eksi R, Fornés O, Guney E, Li H, Marín MA, Panwar B, Planas-Iglesias J, Poglayen D, Cui J, Falcao AO, Suver C, Hoff B, Balagurusamy VS, Dillenberger D, Neto EC, Norman T, Aittokallio T, Ammad-Ud-Din M, Azencott CA, Bellón V, Boeva V, Bunte K, Chheda H, Cheng L, Corander J, Dumontier M, Goldenberg A, Gopalacharyulu P, Hajiloo M, Hidru D, Jaiswal A, Kaski S, Khalfaoui B, Khan SA, Kramer ER, Marttinen P, Mezlini AM, Molparia B, Pirinen M, Saarela J, Samwald M, Stoven V, Tang H, Tang J, Torkamani A, Vert JP, Wang B, Wang T, Wennerberg K, Wineinger NE, Xiao G, Xie, Y, Yeung R, Zhan X, Zhao C; Members of the Rheumatoid Arthritis Challenge Consortium, Greenberg J, Kremer J, Michaud K, Barton A, Coenen M, Mariette X, Miceli C, Shadick N, Weinblatt M, de Vries N, Tak PP, Gerlag D, Huizinga TW, Kurreeman F, Allaart CF, Louis Bridges S Jr, Criswell L, Moreland L, Klareskog L, Saevarsdottir S, Padyukov L, Gregersen PK, Friend S, Plenge R, Stolovitzky G, Oliva B, Guan Y, Mangravite LM, Bridges SL, Criswell L, Moreland L, Klareskog L, Saevarsdottir S, Padyukov L, Gregersen PK, Friend S, Plenge R, Stolovitzky G, Oliva B, Guan Y, Mangravite LM.
August 2016 Nature Communications 7, Article number: 12460

Abstract

ERheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h2=0.18, P value=0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.

Severe Gut Microbiota Dysbiosis Is Associated With Poor Growth in Patients With Short Bowel Syndrome.

Piper HG, Fan D, Coughlin LA, Ho EX, McDaniel MM, Channabasappa N, Kim J, Kim M, Zhan X, Xie, Y, Koh AY.
September 2016 JPEN J Parenter Enteral Nutr. 41(7):1202-1212.

Abstract

Background

Children with short bowel syndrome (SBS) can vary significantly in their growth trajectory. Recent data have shown that children with SBS possess a unique gut microbiota signature compared with healthy controls. We hypothesized that children with SBS and poor growth would exhibit more severe gut microbiota dysbiosis compared with those with SBS who are growing adequately, despite similar intestinal anatomy.

Materials and Methods

Stool samples were collected from children with SBS (n = 8) and healthy controls (n = 3) over 3 months. Gut microbiota populations (16S ribosomal RNA sequencing and metagenomic shotgun sequencing) were compared, including a more in-depth analysis of SBS children exhibiting poor and good growth. Statistical analysis was performed using Mann-Whitney, Kruskal-Wallis, and χ2 tests as appropriate.

Results

Children with SBS had a significant deficiency of the commensal Firmicutes order Clostridiales ( P = .025, Kruskal-Wallis) compared with healthy children. Furthermore, children with SBS and poor growth were deficient in beneficial bacteria known to produce short-chain fatty acids and had expansion of proinflammatory Enterobacteriaceae ( P = .038, Kruskal-Wallis) compared with children with SBS who were growing adequately. Using metabolic function analyses, SBS/poor growth microbiomes were deficient in genes needed for gluconeogenesis but enriched in branched and aromatic amino acid synthesis and citrate cycle pathway genes.

Conclusion

Patients with SBS, particularly those with suboptimal growth, have a marked gut dysbiosis characterized by a paucity of beneficial commensal anaerobes, resulting in a deficiency of key metabolic enzymes found in the gut microbiomes of healthy children.

An Expression Signature as an Aid to the Histologic Classification of Non-Small Cell Lung Cancer.

Girard L, Rodriguez-Canales J, Behrens C, Thompson DM, Botros IW, Tang H, Xie, Y, Rekhtman N, Travis WD, Wistuba II, Minna JD, Gazdar AF.
October 2016 Clinical Cancer Research 10.1158/1078-0432.CCR-15-2900

Abstract

Purpose

Most non-small cell lung cancers (NSCLC) are now diagnosed from small specimens, and classification using standard pathology methods can be difficult. This is of clinical relevance as many therapy regimens and clinical trials are histology dependent. The purpose of this study was to develop an mRNA expression signature as an adjunct test for routine histopathologic classification of NSCLCs.

Experimental Design

A microarray dataset of resected adenocarcinomas (ADC) and squamous cell carcinomas (SCC) was used as the learning set for an ADC-SCC signature. The Cancer Genome Atlas (TCGA) lung RNAseq dataset was used for validation. Another microarray dataset of ADCs and matched nonmalignant lung was used as the learning set for a tumor versus nonmalignant signature. The classifiers were selected as the most differentially expressed genes and sample classification was determined by a nearest distance approach.

Results

We developed a 62-gene expression signature that contained many genes used in immunostains for NSCLC typing. It includes 42 genes that distinguish ADC from SCC and 20 genes differentiating nonmalignant lung from lung cancer. Testing of the TCGA and other public datasets resulted in high prediction accuracies (93%–95%). In addition, a prediction score was derived that correlates both with histologic grading and prognosis. We developed a practical version of the Classifier using the HTG EdgeSeq nuclease protection–based technology in combination with next-generation sequencing that can be applied to formalin-fixed paraffin-embedded (FFPE) tissues and small biopsies.

Conclusion

Our RNA classifier provides an objective, quantitative method to aid in the pathologic diagnosis of lung cancer. Clin Cancer Res; 22(19); 4880–9. ©2016 AACR.

The antitumor toxin CD437 is a direct inhibitor of DNA polymerase α.

Han T, Goralski M, Capota E, Padrick SB, Kim J, Xie, Y, Nijhawan D.
July 2016 Nature Chemical Biology, 12(7):511-5

Abstract

CD437 is a retinoid-like small molecule that selectively induces apoptosis in cancer cells, but not in normal cells, through an unknown mechanism. We used a forward-genetic strategy to discover mutations in POLA1 that coincide with CD437 resistance (POLA1R). Introduction of one of these mutations into cancer cells by CRISPR-Cas9 genome editing conferred CD437 resistance, demonstrating causality. POLA1 encodes DNA polymerase α, the enzyme responsible for initiating DNA synthesis during the S phase of the cell cycle. CD437 inhibits DNA replication in cells and recombinant POLA1 activity in vitro. Both effects are abrogated by the identified POLA1 mutations, supporting POLA1 as the direct antitumor target of CD437. In addition, we detected an increase in the total fluorescence intensity and anisotropy of CD437 in the presence of increasing concentrations of POLA1 that is consistent with a direct binding interaction. The discovery of POLA1 as the direct anticancer target for CD437 has the potential to catalyze the development of CD437 into an anticancer therapeutic.

Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease.

Allen GI, Amoroso N, Anghel C, Balagurusamy V, Bare CJ, Beaton D, Bellotti R, Bennett DA, Boehme KL, Boutros PC, Caberlotto L, Caloian C, Campbell F, Chaibub Neto E, Chang YC, Chen B, Chen CY, Chien TY, Clark T, Das S, Davatzikos C, Deng J, Dillenberger D, Dobson RJ, Dong Q, Doshi J, Duma D, Errico R, Erus G, Everett E, Fardo DW, Friend SH, Fröhlich H, Gan J, St George-Hyslop P, Ghosh SS, Glaab E, Green RC, Guan Y, Hong MY, Huang C, Hwang J, Ibrahim J, Inglese P, Iyappan A, Jiang Q, Katsumata Y, Kauwe JS, Klein A, Kong D, Krause R, Lalonde E, Lauria M, Lee E, Lin X, Liu Z, Livingstone J, Logsdon BA, Lovestone S, Ma TW, Malhotra A, Mangravite LM, Maxwell TJ, Merrill E, Nagorski J, Namasivayam A, Narayan M, Naz M, Newhouse SJ, Norman TC, Nurtdinov RN, Oyang YJ, Pawitan Y, Peng S, Peters MA, Piccolo SR, Praveen P, Priami C, Sabelnykova VY, Senger P, Shen X, Simmons A, Sotiras A, Stolovitzky G, Tangaro S, Tateo A, Tung YA, Tustison NJ, Varol E, Vradenburg G, Weiner MW, Xiao G, Xie L, Xie Y, Xu J, Yang H, Zhan X, Zhou Y, Zhu F, Zhu H, Zhu S; Alzheimer's Disease Neuroimaging Initiative.
June 2016 Alzheimer's & Dementia, Volume 12, Issue 6.

Abstract

Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimer's disease. The Alzheimer's disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state-of-the-art in predicting cognitive outcomes in Alzheimer's disease based on high dimensional, publicly available genetic and structural imaging data. This meta-analysis failed to identify a meaningful predictor developed from either data modality, suggesting that alternate approaches should be considered for prediction of cognitive performance.

High-dimensional genomic data bias correction and data integration using MANCIE.

Zang C, Wang T, Deng K, Li B, Hu S, Qin Q, Xiao T, Zhang S, Meyer CA, He HH, Brown M, Liu JS, Xie Y, Liu XS.
April 2016 Nature Communications 7, Article number: 11305

Abstract

High-dimensional genomic data analysis is challenging due to noises and biases in high-throughput experiments. We present a computational method matrix analysis and normalization by concordant information enhancement (MANCIE) for bias correction and data integration of distinct genomic profiles on the same samples. MANCIE uses a Bayesian-supported principal component analysis-based approach to adjust the data so as to achieve better consistency between sample-wise distances in the different profiles. MANCIE can improve tissue-specific clustering in ENCODE data, prognostic prediction in Molecular Taxonomy of Breast Cancer International Consortium and The Cancer Genome Atlas data, copy number and expression agreement in Cancer Cell Line Encyclopedia data, and has broad applications in cross-platform, high-dimensional data integration.

The Kub5-Hera/RPRD1B interactome: a novel role in preserving genetic stability by regulating DNA mismatch repair.

Patidar PL, Motea EA, Fattah FJ, Zhou Y, Morales JC, Xie Y, Garner HR, Boothman DA.
February 2016 Nucleic Acids Research, Volume 44, Issue 4,
image

Abstract

Ku70-binding protein 5 (Kub5)-Hera (K-H)/RPRD1B maintains genetic integrity by concomitantly minimizing persistent R-loops and promoting repair of DNA double strand breaks (DSBs). We used tandem affinity purification-mass spectrometry, co-immunoprecipitation and gel-filtration chromatography to define higher-order protein complexes containing K-H scaffolding protein to gain insight into its cellular functions. We confirmed known protein partners (Ku70, RNA Pol II, p15RS) and discovered several novel associated proteins that function in RNA metabolism (Topoisomerase 1 and RNA helicases), DNA repair/replication processes (PARP1, MSH2, Ku, DNA-PKcs, MCM proteins, PCNA and DNA Pol δ) and in protein metabolic processes, including translation. Notably, this approach directed us to investigate an unpredicted involvement of K-H in DNA mismatch repair (MMR) where K-H depletion led to concomitant MMR deficiency and compromised global microsatellite stability. Mechanistically, MMR deficiency in K-H-depleted cells was a consequence of reduced stability of the core MMR proteins (MLH1 and PMS2) caused by elevated basal caspase-dependent proteolysis. Pan-caspase inhibitor treatment restored MMR protein loss. These findings represent a novel mechanism to acquire MMR deficiency/microsatellite alterations. A significant proportion of colon, endometrial and ovarian cancers exhibit k-h expression/copy number loss and may have severe mutator phenotypes with enhanced malignancies that are currently overlooked based on sporadic MSI+ screening.

Noncoding RNA NORAD Regulates Genomic Stability by Sequestering PUMILIO Proteins.

Lee S, Kopp F, Chang TC, Sataluri A, Chen B, Sivakumar S, Yu H, Xie Y, Mendell JT.
January 2016 Cell.Volume 164, Issues 1–2, Pages 69-80
image

Abstract

Long noncoding RNAs (lncRNAs) have emerged as regulators of diverse biological processes. Here, we describe the initial functional analysis of a poorly characterized human lncRNA (LINC00657) that is induced after DNA damage, which we termed “noncoding RNA activated by DNA damage”, or NORAD. NORAD is highly conserved and abundant, with expression levels of approximately 500–1,000 copies per cell. Remarkably, inactivation of NORAD triggers dramatic aneuploidy in previously karyotypically stable cell lines. NORAD maintains genomic stability by sequestering PUMILIO proteins, which repress the stability and translation of mRNAs to which they bind. In the absence of NORAD, PUMILIO proteins drive chromosomal instability by hyperactively repressing mitotic, DNA repair, and DNA replication factors. These findings introduce a mechanism that regulates the activity of a deeply conserved and highly dosage-sensitive family of RNA binding proteins and reveal unanticipated roles for a lncRNA and PUMILIO proteins in the maintenance of genomic stability.

Transcription Factor Hepatocyte Nuclear Factor-1β Regulates Renal Cholesterol Metabolism.

Aboudehen K, Kim MS, Mitsche M, Garland K, Anderson N, Noureddine L, Pontoglio M, Patel V, Xie Y, DeBose-Boyd R, Igarashi P.
January 2016 Journal of the American Society of Nephrology

Abstract

HNF-1β is a tissue-specific transcription factor that is expressed in the kidney and other epithelial organs. Humans with mutations in HNF-1β develop kidney cysts, and HNF-1β regulates the transcription of several cystic disease genes. However, the complete spectrum of HNF-1β-regulated genes and pathways is not known. Here, using chromatin immunoprecipitation/next generation sequencing and gene expression profiling, we identified 1545 protein-coding genes that are directly regulated by HNF-1β in murine kidney epithelial cells. Pathway analysis predicted that HNF-1β regulates cholesterol metabolism. Expression of dominant negative mutant HNF-1β or kidney-specific inactivation of HNF-1β decreased the expression of genes that are essential for cholesterol synthesis, including sterol regulatory element binding factor 2 (Srebf2) and 3-hydroxy-3-methylglutaryl-CoA reductase (Hmgcr). HNF-1β mutant cells also expressed lower levels of cholesterol biosynthetic intermediates and had a lower rate of cholesterol synthesis than control cells. Additionally, depletion of cholesterol in the culture medium mitigated the inhibitory effects of mutant HNF-1β on the proteins encoded by Srebf2 and Hmgcr, and HNF-1β directly controlled the renal epithelial expression of proprotein convertase subtilisin-like kexin type 9, a key regulator of cholesterol uptake. These findings reveal a novel role of HNF-1β in a transcriptional network that regulates intrarenal cholesterol metabolism.

A Phase I Dose-Escalation Trial of Single-Fraction Stereotactic Radiation Therapy for Liver Metastases.

Meyer JJ, Foster RD, Lev-Cohain N, Yokoo T, Dong Y, Schwarz RE, Rule W, Tian J, Xie Y, Hannan R, Nedzi L, Solberg T, Timmerman R.
January 2016 Annals of Surgical Oncology, Volume 23, Issue 1, pp 218–224

Abstract

Background

There is significant interest in the use of stereotactic ablative radiotherapy (SABR) as a treatment modality for liver metastases. A variety of SABR fractionation schemes are in clinical use. We conducted a phase I dose-escalation study to determine the maximum tolerated dose of single-fraction liver SABR.

Methods

Patients with liver metastases from solid tumors, for whom a critical volume dose constraint could be met, were treated with single-fraction SABR. Seven patients were enrolled to the first group, with a prescription dose of 35 Gy. Dose was then escalated to 40 Gy in a single fraction, and seven more patients were treated at this dose level. Patients were followed for toxicity and underwent serial imaging to assess lesion response and local control.

Results

Fourteen patients with 17 liver metastases were treated. There were no dose-limiting toxicities observed at either dose level. Nine of the 13 lesions assessable for treatment response showed a complete radiographic response to treatment; the remainder showed partial response. Local control of irradiated lesions was 100 % at a median imaging follow-up of 2.5 years. Two-year overall survival for all patients was 78 %.

Conclusions

For selected patients with liver metastases, single-fraction SABR at doses of 35 and 40 Gy is tolerable and shows promising signs of efficacy at intermediate follow-up.

Comprehensive functional characterization of cancer-testis antigens defines obligate participation in multiple hallmarks of cancer..

Maxfield KE, Taus PJ, Corcoran K, Wooten J, Macion J, Zhou Y, Borromeo M, Kollipara RK, Yan J, Xie Y, Xie XJ, Whitehurst AW.
November 2015 Nature Communications 6, Article number: 8840 (2015)
image

Abstract

Tumours frequently activate genes whose expression is otherwise biased to the testis, collectively known as cancer-testis antigens (CTAs). The extent to which CTA expression represents epiphenomena or confers tumorigenic traits is unknown. In this study, to address this, we implemented a multidimensional functional genomics approach that incorporates 7 different phenotypic assays in 11 distinct disease settings. We identify 26 CTAs that are essential for tumor cell viability and/or are pathological drivers of HIF, WNT or TGFβ signalling. In particular, we discover that Foetal and Adult Testis Expressed 1 (FATE1) is a key survival factor in multiple oncogenic backgrounds. FATE1 prevents the accumulation of the stress-sensing BH3-only protein, BCL-2-Interacting Killer (BIK), thereby permitting viability in the presence of toxic stimuli. Furthermore, ZNF165 promotes TGFβ signalling by directly suppressing the expression of negative feedback regulatory pathways. This action is essential for the survival of triple negative breast cancer cells in vitro and in vivo. Thus, CTAs make significant direct contributions to tumour biology.

NRF2 regulates serine biosynthesis in non-small cell lung cancer.

DeNicola GM, Chen PH, Mullarky E, Sudderth JA, Hu Z, Wu D, Tang H, Xie Y, Asara JM, Huffman KE, Wistuba II, Minna JD, DeBerardinis RJ, Cantley LC.
December 2015 Nature Genetics 47, 1475–1481

Abstract

Tumors have high energetic and anabolic needs for rapid cell growth and proliferation, and the serine biosynthetic pathway was recently identified as an important source of metabolic intermediates for these processes. We integrated metabolic tracing and transcriptional profiling of a large panel of non-small cell lung cancer (NSCLC) cell lines to characterize the activity and regulation of the serine/glycine biosynthetic pathway in NSCLC. Here we show that the activity of this pathway is highly heterogeneous and is regulated by NRF2, a transcription factor frequently deregulated in NSCLC. We found that NRF2 controls the expression of the key serine/glycine biosynthesis enzyme genes PHGDH, PSAT1 and SHMT2 via ATF4 to support glutathione and nucleotide production. Moreover, we show that expression of these genes confers poor prognosis in human NSCLC. Thus, a substantial fraction of human NSCLCs activates an NRF2-dependent transcriptional program that regulates serine and glycine metabolism and is linked to clinical aggressiveness.

Phase 1 study of romidepsin plus erlotinib in advanced non-small cell lung cancer.

Gerber DE, Boothman DA, Fattah FJ, Dong Y, Zhu H, Skelton RA, Priddy LL, Vo P, Dowell JE, Sarode V, Leff R, Meek C, Xie Y, Schiller JH.
December 2015 Lung Cancer Volume 90, Issue 3, Pages 534-541

Abstract

Purpose

Preclinical studies demonstrated anti-tumor efficacy of the combination of the histone deacetylase (HDAC) inhibitor romidepsin plus erlotinib in non-small cell lung cancer (NSCLC) models that were insensitive to erlotinib monotherapy. We therefore studied this combination in a phase 1 clinical trial in previously treated advanced NSCLC.

Methods

Romidepsin (8 or 10mg/m(2)) was administered intravenously on days 1, 8, and 15 every 28 days in combination with erlotinib (150 mg orally daily), with romidepsin monotherapy lead-in during Cycle 1. Correlative studies included peripheral blood mononuclear cell HDAC activity and histone acetylation status, and EGFR pathway activation status in skin biopsies.

Results

A total of 17 patients were enrolled. Median number of prior lines of therapy was 3 (range 1-5). No cases had a sensitizing EGFR mutation. The most common related adverse events were nausea, vomiting, and fatigue (each 82%), diarrhea (65%), anorexia (53%), and rash (41%). Dose-limiting nausea and vomiting occurred at the romidepsin 10 mg/m(2) level despite aggressive antiemetic prophylaxis and treatment. Among 10 evaluable patients, the best response was stable disease (n=7) and progressive disease (n=3). Median progression-free survival (PFS) was 3.3 months (range 1.4-16.5 months). Prolonged PFS (>6 months) was noted in a KRAS mutant adenocarcinoma and a squamous cell cancer previously progressed on erlotinib monotherapy. Romidepsin monotherapy inhibited HDAC activity, increased histone acetylation status, and inhibited EGFR phosphorylation.

Conclusions

Romidepsin 8 mg/m(2) plus erlotinib appears well tolerated, has evidence of disease control, and exhibits effects on relevant molecular targets in an unselected advanced NSCLC population.

Targeting glutamine metabolism sensitizes pancreatic cancer to PARP-driven metabolic catastrophe induced by ß-lapachone.

Chakrabarti G, Moore ZR, Luo X, Ilcheva M, Ali A, Padanad M, Zhou Y, Xie Y, Burma S, Scaglioni PP, Cantley LC, DeBerardinis RJ, Kimmelman AC, Lyssiotis CA, Boothman DA.
October 2015 Cancer & Metabolism

Abstract

Bcakground

Pancreatic ductal adenocarcinomas (PDA) activate a glutamine-dependent pathway of cytosolic nicotinamide adenine dinucleotide phosphate (NADPH) production to maintain redox homeostasis and support proliferation. Enzymes involved in this pathway (GLS1 (mitochondrial glutaminase 1), GOT1 (cytoplasmic glutamate oxaloacetate transaminase 1), and GOT2 (mitochondrial glutamate oxaloacetate transaminase 2)) are highly upregulated in PDA, and among these, inhibitors of GLS1 were recently deployed in clinical trials to target anabolic glutamine metabolism. However, single-agent inhibition of this pathway is cytostatic and unlikely to provide durable benefit in controlling advanced disease.

Results

Here, we report that reducing NADPH pools by genetically or pharmacologically (bis-2-(5-phenylacetamido-1,2,4-thiadiazol-2-yl)ethyl sulfide (BPTES) or CB-839) inhibiting glutamine metabolism in mutant Kirsten rat sarcoma viral oncogene homolog (KRAS) PDA sensitizes cell lines and tumors to ß-lapachone (ß-lap, clinical form ARQ761). ß-Lap is an NADPH:quinone oxidoreductase (NQO1)-bioactivatable drug that leads to NADPH depletion through high levels of reactive oxygen species (ROS) from the futile redox cycling of the drug and subsequently nicotinamide adenine dinucleotide (NAD)+ depletion through poly(ADP ribose) polymerase (PARP) hyperactivation. NQO1 expression is highly activated by mutant KRAS signaling. As such, ß-lap treatment concurrent with inhibition of glutamine metabolism in mutant KRAS, NQO1 overexpressing PDA leads to massive redox imbalance, extensive DNA damage, rapid PARP-mediated NAD+ consumption, and PDA cell death-features not observed in NQO1-low, wild-type KRAS expressing cells.

Conclusions

This treatment strategy illustrates proof of principle that simultaneously decreasing glutamine metabolism-dependent tumor anti-oxidant defenses and inducing supra-physiological ROS formation are tumoricidal and that this rationally designed combination strategy lowers the required doses of both agents in vitro and in vivo. The non-overlapping specificities of GLS1 inhibitors and ß-lap for PDA tumors afford high tumor selectivity, while sparing normal tissue.

A systematic analysis reveals heterogeneous changes in the endocytic activities of cancer cells.

Elkin SR, Bendris N, Reis CR, Zhou Y, Xie Y, Huffman KE, Minna JD, Schmid SL.
November 2015 Cancer Research, Volume 75, Issue 21
image

Abstract

Metastasis is a multistep process requiring cancer cell signaling, invasion, migration, survival, and proliferation. These processes require dynamic modulation of cell surface proteins by endocytosis. Given this functional connection, it has been suggested that endocytosis is dysregulated in cancer. To test this, we developed In-Cell ELISA assays to measure three different endocytic pathways: clathrin-mediated endocytosis, caveolae-mediated endocytosis, and clathrin-independent endocytosis and compared these activities using two different syngeneic models for normal and oncogene-transformed human lung epithelial cells. We found that all endocytic activities were reduced in the transformed versus normal counterparts. However, when we screened 29 independently isolated non-small cell lung cancer (NSCLC) cell lines to determine whether these changes were systematic, we observed significant heterogeneity. Nonetheless, using hierarchical clustering based on their combined endocytic properties, we identified two phenotypically distinct clusters of NSCLCs. One co-clustered with mutations in KRAS, a mesenchymal phenotype, increased invasion through collagen and decreased growth in soft agar, whereas the second was enriched in cells with an epithelial phenotype. Interestingly, the two clusters also differed significantly in clathrin-independent internalization and surface expression of CD44 and CD59. Taken together, our results suggest that endocytotic alterations in cancer cells that affect cell surface expression of critical molecules have a significant influence on cancer-relevant phenotypes, with potential implications for interventions to control cancer by modulating endocytic dynamics.

Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach.

Shi X, Zhao Q, Huang J, Xie Y, Ma S.
December 2015 Bioinformatics, Volume 31, Issue 24, Pages 3977–3983

Abstract

Motivation

Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation.

Results

A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method.

Availability and Implementation

R code is available at http://works.bepress.com/shuangge/49/.

Prediction of human population responses to toxic compounds by a collaborative competition.

Eduati F, Mangravite LM, Wang T, Tang H, Bare JC, Huang R, Norman T, Kellen M, Menden MP, Yang J, Zhan X, Zhong R, Xiao G, Xia M, Abdo N, Kosyk O; NIEHS-NCATS-UNC DREAM Toxicogenetics Collaboration, Friend S, Dearry A, Simeonov A, Tice RR, Rusyn I, Wright FA, Stolovitzky G, Xie Y, Saez-Rodriguez J.
September 2015 Nature Biotechnology 33, 933–940 (2015)
image

Abstract

The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytotoxicity of 156 compounds in 884 lymphoblastoid cell lines for which genotype and transcriptional data are available as part of the Tox21 1000 Genomes Project. The challenge participants developed algorithms to predict interindividual variability of toxic response from genomic profiles and population-level cytotoxicity data from structural attributes of the compounds. 179 submitted predictions were evaluated against an experimental data set to which participants were blinded. Individual cytotoxicity predictions were better than random, with modest correlations (Pearson's r < 0.28), consistent with complex trait genomic prediction. In contrast, predictions of population-level response to different compounds were higher (r < 0.66). The results highlight the possibility of predicting health risks associated with unknown compounds, although risk estimation accuracy remains suboptimal.

Activation of HIF-1α and LL-37 by commensal bacteria inhibits Candida albicans colonization.

Fan D, Coughlin LA, Neubauer MM, Kim J, Kim MS, Zhan X, Simms-Waldrip TR, Xie Y, Hooper LV, Koh AY.
July 2015 Nature Medicine 21, 808–814 (2015)

Abstract

Candida albicans colonization is required for invasive disease. Unlike humans, adult mice with mature intact gut microbiota are resistant to C. albicans gastrointestinal (GI) colonization, but the factors that promote C. albicans colonization resistance are unknown. Here we demonstrate that commensal anaerobic bacteria-specifically clostridial Firmicutes (clusters IV and XIVa) and Bacteroidetes-are critical for maintaining C. albicans colonization resistance in mice. Using Bacteroides thetaiotamicron as a model organism, we find that hypoxia-inducible factor-1α (HIF-1α), a transcription factor important for activating innate immune effectors, and the antimicrobial peptide LL-37 (CRAMP in mice) are key determinants of C. albicans colonization resistance. Although antibiotic treatment enables C. albicans colonization, pharmacologic activation of colonic Hif1a induces CRAMP expression and results in a significant reduction of C. albicans GI colonization and a 50% decrease in mortality from invasive disease. In the setting of antibiotics, Hif1a and Camp (which encodes CRAMP) are required for B. thetaiotamicron-induced protection against C. albicans colonization of the gut. Thus, modulating C. albicans GI colonization by activation of gut mucosal immune effectors may represent a novel therapeutic approach for preventing invasive fungal disease in humans.

Identifying CDKN3 Gene Expression as a Prognostic Biomarker in Lung Adenocarcinoma via Meta-analysis.

Zang X, Chen M, Zhou Y, Xiao G, Xie Y, Wang X.
May 2015 Cancer Informatics, 24;14(Suppl 2):183-91

Abstract

Lung cancer is among the major causes of cancer deaths, and the survival rate of lung cancer patients is extremely low. Recent studies have demonstrated that the gene CDKN3 is related to neoplasia, but in the literature severe controversy exists over whether it is involved in cancer progression or, conversely, tumor inhibition. In this study, we investigated the expression of CDKN3 and its association with prognosis in lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC) using datasets in Lung Cancer Explorer (LCE; http://qbrc.swmed.edu/lce/). We found that CDKN3 was up-regulated in ADC and SCC compared to normal tissues. We also found that CDKN3 was expressed at a higher level in SCC than in ADC, which was further validated through meta-analysis (coefficient = 2.09, 95% CI = 1.50-2.67, P < 0.0001). In addition, based on meta-analysis for the prognostic value of CDKN3, we found that higher CDKN3 expression was associated with poorer survival outcomes in ADC (HR = 1.65, 95% CI = 1.39-1.96, P < 0.0001), but not in SCC (HR = 1.10, 95% CI = 0.84-1.44, P = 0.494). Our findings indicate that CDKN3 may be a prognostic marker in ADC, though the detailed mechanism is yet to be revealed.

Elucidation of changes in molecular signalling leading to increased cellular transformation in oncogenically progressed human bronchial epithelial cells exposed to radiations of increasing LET.

Ding LH, Park S, Xie Y, Girard L, Minna JD, Story MD.
September 2015 Mutagenesis, Volume 30, Issue 5, Pages 685–694,
image

Abstract

The early transcriptional response and subsequent induction of anchorage-independent growth after exposure to particles of high Z and energy (HZE) as well as γ-rays were examined in human bronchial epithelial cells (HBEC3KT) immortalised without viral oncogenes and an isogenic variant cell line whose p53 expression was suppressed but that expressed an active mutant K-RAS(V12) (HBEC3KT-P53KRAS). Cell survival following irradiation showed that HBEC3KT-P53KRAS cells were more radioresistant than HBEC3KT cells irrespective of the radiation species. In addition, radiation enhanced the ability of the surviving HBEC3KT-P53RAS cells but not the surviving HBEC3KT cells to grow in anchorage-independent fashion (soft agar colony formation). HZE particle irradiation was far more efficient than γ-rays at rendering HBEC3KT-P53RAS cells permissive for soft agar growth. Gene expression profiles after radiation showed that the molecular response to radiation for HBEC3KT-P53RAS, similar to that for HBEC3KT cells, varies with radiation quality. Several pathways associated with anchorage independent growth, including the HIF-1α, mTOR, IGF-1, RhoA and ERK/MAPK pathways, were over-represented in the irradiated HBEC3KT-P53RAS cells compared to parental HBEC3KT cells. These results suggest that oncogenically progressed human lung epithelial cells are at greater risk for cellular transformation and carcinogenic risk after ionising radiation, but particularly so after HZE radiations. These results have implication for: (i) terrestrial radiation and suggests the possibility of enhanced carcinogenic risk from diagnostic CT screens used for early lung cancer detection; (ii) enhanced carcinogenic risk from heavy particles used in radiotherapy; and (iii) for space radiation, raising the possibility that astronauts harbouring epithelial regions of dysplasia or hyperplasia within the lung that contain oncogenic changes, may have a greater risk for lung cancers based upon their exposure to heavy particles present in the deep space environment.

Design and bioinformatics analysis of genome-wide CLIP experiments.

Wang T, Xiao G, Chu Y, Zhang MQ, Corey DR, Xie Y,
June 2015 Nucleic Acids Research, Volume 43, Issue 11, Pages 5263–5274,

Abstract

The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP-RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses.

Decreased BECN1 mRNA Expression in Human Breast Cancer is Associated with Estrogen Receptor-Negative Subtypes and Poor Prognosis.

Tang H, Sebti S, Titone R, Zhou Y, Isidoro C, Ross TS, Hibshoosh H, Xiao G, Packer M, Xie Y, Levine B.
March 2015 EBioMedicine, 2(3):255-263.

Abstract

Both BRCA1 and Beclin 1 (BECN1) are tumor suppressor genes, which are in close proximity on the human chromosome 17q21 breast cancer tumor susceptibility locus and are often concurrently deleted. However, their importance in sporadic human breast cancer is not known. To interrogate the effects of BECN1 and BRCA1 in breast cancer, we studied their mRNA expression patterns in breast cancer patients from two large datasets: The Cancer Genome Atlas (TCGA) (n=1067) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (n=1992). In both datasets, low expression of BECN1 was more common in HER2-enriched and basal-like (mostly triple-negative) breast cancers compared to luminal A/B intrinsic tumor subtypes, and was also strongly associated with TP53 mutations and advanced tumor grade. In contrast, there was no significant association between low BRCA1 expression and HER2-enriched or basal-like subtypes, TP53 mutations or tumor grade. In addition, low expression of BECN1 (but not low BRCA1) was associated with poor prognosis, and BECN1 (but not BRCA1) expression was an independent predictor of survival. These findings suggest that decreased mRNA expression of the autophagy gene BECN1 may contribute to the pathogenesis and progression of HER2-enriched, basal-like, and TP53 mutant breast cancers.

Intramolecular circularization increases efficiency of RNA sequencing and enables CLIP-Seq of nuclear RNA from human cells.

Chu Y, Wang T, Dodd D, Xie Y, Janowski BA, Corey DR.
March 2015 Nucleic Acids Research, Volume 43, Issue 11, Pages e75,

Abstract

RNA sequencing (RNA-Seq) is a powerful tool for analyzing the identity of cellular RNAs but is often limited by the amount of material available for analysis. In spite of extensive efforts employing existing protocols, we observed that it was not possible to obtain useful sequencing libraries from nuclear RNA derived from cultured human cells after crosslinking and immunoprecipitation (CLIP). Here, we report a method for obtaining strand-specific small RNA libraries for RNA sequencing that requires picograms of RNA. We employ an intramolecular circularization step that increases the efficiency of library preparation and avoids the need for intermolecular ligations of adaptor sequences. Other key features include random priming for full-length cDNA synthesis and gel-free library purification. Using our method, we generated CLIP-Seq libraries from nuclear RNA that had been UV-crosslinked and immunoprecipitated with anti-Argonaute 2 (Ago2) antibody. Computational protocols were developed to enable analysis of raw sequencing data and we observe substantial differences between recognition by Ago2 of RNA species in the nucleus relative to the cytoplasm. This RNA self-circularization approach to RNA sequencing (RC-Seq) allows data to be obtained using small amounts of input RNA that cannot be sequenced by standard methods.

The nuclear receptor DAF-12 regulates nutrient metabolism and reproductive growth in nematodes.

Wang Z, Stoltzfus J, You YJ, Ranjit N, Tang H, Xie Y, Lok JB, Mangelsdorf DJ, Kliewer SA.
March 2015 PLoS Genet. 11(3):e1005027.
image

Abstract

Appropriate nutrient response is essential for growth and reproduction. Under favorable nutrient conditions, the C. elegans nuclear receptor DAF-12 is activated by dafachronic acids, hormones that commit larvae to reproductive growth. Here, we report that in addition to its well-studied role in controlling developmental gene expression, the DAF-12 endocrine system governs expression of a gene network that stimulates the aerobic catabolism of fatty acids. Thus, activation of the DAF-12 transcriptome coordinately mobilizes energy stores to permit reproductive growth. DAF-12 regulation of this metabolic gene network is conserved in the human parasite, Strongyloides stercoralis, and inhibition of specific steps in this network blocks reproductive growth in both of the nematodes. Our study provides a molecular understanding for metabolic adaptation of nematodes to their environment, and suggests a new therapeutic strategy for treating parasitic diseases.

HITS-CLIP analysis uncovers a link between the Kaposi's sarcoma-associated herpesvirus ORF57 protein and host pre-mRNA metabolism.

Sei E, Wang T, Hunter OV, Xie Y, Conrad NK.
Febrery 2015 PLoS Pathog. 11(2):e1004652. doi: 10.1371/journal.ppat.1004652.

Abstract

The Kaposi's sarcoma associated herpesvirus (KSHV) is an oncogenic virus that causes Kaposi's sarcoma, primary effusion lymphoma (PEL), and some forms of multicentric Castleman's disease. The KSHV ORF57 protein is a conserved posttranscriptional regulator of gene expression that is essential for virus replication. ORF57 is multifunctional, but most of its activities are directly linked to its ability to bind RNA. We globally identified virus and host RNAs bound by ORF57 during lytic reactivation in PEL cells using high-throughput sequencing of RNA isolated by cross-linking immunoprecipitation (HITS-CLIP). As expected, ORF57-bound RNA fragments mapped throughout the KSHV genome, including the known ORF57 ligand PAN RNA. In agreement with previously published ChIP results, we observed that ORF57 bound RNAs near the oriLyt regions of the genome. Examination of the host RNA fragments revealed that a subset of the ORF57-bound RNAs was derived from transcript 5' ends. The position of these 5'-bound fragments correlated closely with the 5'-most exon-intron junction of the pre-mRNA. We selected four candidates (BTG1, EGR1, ZFP36, and TNFSF9) and analyzed their pre-mRNA and mRNA levels during lytic phase. Analysis of both steady-state and newly made RNAs revealed that these candidate ORF57-bound pre-mRNAs persisted for longer periods of time throughout infection than control RNAs, consistent with a role for ORF57 in pre-mRNA metabolism. In addition, exogenous expression of ORF57 was sufficient to increase the pre-mRNA levels and, in one case, the mRNA levels of the putative ORF57 targets. These results demonstrate that ORF57 interacts with specific host pre-mRNAs during lytic reactivation and alters their processing, likely by stabilizing pre-mRNAs. These data suggest that ORF57 is involved in modulating host gene expression in addition to KSHV gene expression during lytic reactivation.

Real-time resolution of point mutations that cause phenovariance in mice.

Wang T, Zhan X, Bu CH, Lyon S, Pratt D, Hildebrand S, Choi JH, Zhang Z, Zeng M, Wang KW, Turer E, Chen Z, Zhang D, Yue T, Wang Y, Shi H, Wang J, Sun L, SoRelle J, McAlpine W, Hutchins N, Zhan X, Fina M, Gobert R, Quan J, Kreutzer M, Arnett S, Hawkins K, Leach A, Tate C, Daniel C, Reyna C, Prince L, Davis S, Purrington J, Bearden R, Weatherly J, White D, Russell J, Sun Q, Tang M, Li X, Scott L, Moresco EM, McInerney GM, Karlsson Hedestam GB, Xie Y, Beutler B.
Febrery 2015 Proc Natl Acad Sci U S A. 112(5):E440-9. doi: 10.1073/pnas.1423216112.

Abstract

With the wide availability of massively parallel sequencing technologies, genetic mapping has become the rate limiting step in mammalian forward genetics. Here we introduce a method for real-time identification of N-ethyl-N-nitrosourea-induced mutations that cause phenotypes in mice. All mutations are identified by whole exome G1 progenitor sequencing and their zygosity is established in G2/G3 mice before phenotypic assessment. Quantitative and qualitative traits, including lethal effects, in single or multiple combined pedigrees are then analyzed with Linkage Analyzer, a software program that detects significant linkage between individual mutations and aberrant phenotypic scores and presents processed data as Manhattan plots. As multiple alleles of genes are acquired through mutagenesis, pooled "superpedigrees" are created to analyze the effects. Our method is distinguished from conventional forward genetic methods because it permits (1) unbiased declaration of mappable phenotypes, including those that are incompletely penetrant (2), automated identification of causative mutations concurrent with phenotypic screening, without the need to outcross mutant mice to another strain and backcross them, and (3) exclusion of genes not involved in phenotypes of interest. We validated our approach and Linkage Analyzer for the identification of 47 mutations in 45 previously known genes causative for adaptive immune phenotypes; our analysis also implicated 474 genes not previously associated with immune function. The method described here permits forward genetic analysis in mice, limited only by the rates of mutant production and screening.

iScreen: Image-Based High-Content RNAi Screening Analysis Tools.

Zhong R, Dong X, Levine B, Xie Y, Xiao G.
September 2015 J Biomol Screen. 20(8):998-1002. doi: 10.1177/1087057114564348.

Abstract

High-throughput RNA interference (RNAi) screening has opened up a path to investigating functional genomics in a genome-wide pattern. However, such studies are often restricted to assays that have a single readout format. Recently, advanced image technologies have been coupled with high-throughput RNAi screening to develop high-content screening, in which one or more cell image(s), instead of a single readout, were generated from each well. This image-based high-content screening technology has led to genome-wide functional annotation in a wider spectrum of biological research studies, as well as in drug and target discovery, so that complex cellular phenotypes can be measured in a multiparametric format. Despite these advances, data analysis and visualization tools are still largely lacking for these types of experiments. Therefore, we developed iScreen (image-Based High-content RNAi Screening Analysis Tool), an R package for the statistical modeling and visualization of image-based high-content RNAi screening. Two case studies were used to demonstrate the capability and efficiency of the iScreen package. iScreen is available for download on CRAN (http://cran.cnr.berkeley.edu/web/packages/iScreen/index.html). The user manual is also available as a supplementary document.

A community computational challenge to predict the activity of pairs of compounds.

Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser LM, Realubit R, Mattioli M, Alvarez MJ, Shen Y; NCI-DREAM Community, Gallahan D, Singer D, Saez-Rodriguez J, Xie Y, Stolovitzky G, Califano A; NCI-DREAM Community.
December 2014 Nature Biotechnology 32, 1213–1222

Abstract

Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods (31 community-generated approaches and SynGen), four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction.

Ensemble-based network aggregation improves the accuracy of gene network reconstruction.

Zhong R, Allen JD, Xiao G, Xie Y,
November 2014 PLoS One. 9(11):e106319. doi: 10.1371/journal.pone.0106319.

Abstract

Reverse engineering approaches to constructing gene regulatory networks (GRNs) based on genome-wide mRNA expression data have led to significant biological findings, such as the discovery of novel drug targets. However, the reliability of the reconstructed GRNs needs to be improved. Here, we propose an ensemble-based network aggregation approach to improving the accuracy of network topologies constructed from mRNA expression data. To evaluate the performances of different approaches, we created dozens of simulated networks from combinations of gene-set sizes and sample sizes and also tested our methods on three Escherichia coli datasets. We demonstrate that the ensemble-based network aggregation approach can be used to effectively integrate GRNs constructed from different studies - producing more accurate networks. We also apply this approach to building a network from epithelial mesenchymal transition (EMT) signature microarray data and identify hub genes that might be potential drug targets. The R code used to perform all of the analyses is available in an R package entitled "ENA", accessible on CRAN (http://cran.r-project.org/web/packages/ENA/).

ASCL1 is a lineage oncogene providing therapeutic targets for high-grade neuroendocrine lung cancers.

Augustyn A, Borromeo M, Wang T, Fujimoto J, Shao C, Dospoy PD, Lee V, Tan C, Sullivan JP, Larsen JE, Girard L, Behrens C, Wistuba II, Xie Y, Cobb MH, Gazdar AF, Johnson JE, Minna JD.
October 2014 PNAS 111 (41) 14788-14793;

Abstract

Aggressive neuroendocrine lung cancers, including small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), represent an understudied tumor subset that accounts for approximately 40,000 new lung cancer cases per year in the United States. No targeted therapy exists for these tumors. We determined that achaete-scute homolog 1 (ASCL1), a transcription factor required for proper development of pulmonary neuroendocrine cells, is essential for the survival of a majority of lung cancers (both SCLC and NSCLC) with neuroendocrine features. By combining whole-genome microarray expression analysis performed on lung cancer cell lines with ChIP-Seq data designed to identify conserved transcriptional targets of ASCL1, we discovered an ASCL1 target 72-gene expression signature that (i) identifies neuroendocrine differentiation in NSCLC cell lines, (ii) is predictive of poor prognosis in resected NSCLC specimens from three datasets, and (iii) represents novel "druggable" targets. Among these druggable targets is B-cell CLL/lymphoma 2, which when pharmacologically inhibited stops ASCL1-dependent tumor growth in vitro and in vivo and represents a proof-of-principle ASCL1 downstream target gene. Analysis of downstream targets of ASCL1 represents an important advance in the development of targeted therapy for the neuroendocrine class of lung cancers, providing a significant step forward in the understanding and therapeutic targeting of the molecular vulnerabilities of neuroendocrine lung cancer.

Poly-dipeptides encoded by the C9orf72 repeats bind nucleoli, impede RNA biogenesis, and kill cells.

Kwon I, Xiang S, Kato M, Wu L, Theodoropoulos P, Wang T, Kim J, Yun J, Xie Y, McKnight SL.
September 2014 Science Vol. 345, Issue 6201, pp. 1139-1145

Abstract

Many RNA regulatory proteins controlling pre-messenger RNA splicing contain serine:arginine (SR) repeats. Here, we found that these SR domains bound hydrogel droplets composed of fibrous polymers of the low-complexity domain of heterogeneous ribonucleoprotein A2 (hnRNPA2). Hydrogel binding was reversed upon phosphorylation of the SR domain by CDC2-like kinases 1 and 2 (CLK1/2). Mutated variants of the SR domains changing serine to glycine (SR-to-GR variants) also bound to hnRNPA2 hydrogels but were not affected by CLK1/2. When expressed in mammalian cells, these variants bound nucleoli. The translation products of the sense and antisense transcripts of the expansion repeats associated with the C9orf72 gene altered in neurodegenerative disease encode GRn and PRn repeat polypeptides. Both peptides bound to hnRNPA2 hydrogels independent of CLK1/2 activity. When applied to cultured cells, both peptides entered cells, migrated to the nucleus, bound nucleoli, and poisoned RNA biogenesis, which caused cell death.

Predictors and intensity of online access to electronic medical records among patients with cancer.

Gerber DE, Laccetti AL, Chen B, Yan J, Cai J, Gates S, Xie Y, Lee SJ.
September 2014 Journal of Oncology Practice 10, no. 5

Abstract

Introduction

Electronic portals are secure Web-based servers that provide patients with real-time access to their personal health record (PHR). These applications are now widely used at cancer centers nationwide, but their impact has not been well studied. This study set out to determine predictors and patterns of use of a Web-based portal for accessing PHRs and communicating with health providers among patients with cancer.

Methods

Retrospective analysis of enrollment in and use of MyChart, a PHR portal for the Epic electronic medical record system, among patients seen at a National Cancer Institute-designated cancer center. Predictors of MyChart use were analyzed through univariable and multivariable regression models.

Results

A total of 6,495 patients enrolled in MyChart from 2007 to 2012. The median number of log-ins over this period was 57 (interquartile range 17-137). The most common portal actions were viewing test results (37%), viewing and responding to clinic messages (29%), and sending medical advice requests (6.4%). Increased portal use was significantly associated with younger age, white race, and an upper aerodigestive malignancy diagnosis. Thirty-seven percent of all log-ins and 31% of all medical advice requests occurred outside clinic hours. Over the study period, the average number of patient log-ins per year more than doubled.

Conclusions

Among patients with cancer, PHR portal use is frequent and increasing. Younger patients, white patients, and patients with upper aerodigestive malignancies exhibit the heaviest portal use. Understanding the implications of this new technology will be central to the delivery of safe and effective care.

Computational detection and suppression of sequence-specific off-target phenotypes from whole genome RNAi screens.

Zhong R, Kim J, Kim HS, Kim M, Lum L, Levine B, Xiao G, White MA, Xie Y,
July 2014 Nucleic Acids Research, Volume 42, Issue 13, Pages 8214–8222,

Abstract

A challenge for large-scale siRNA loss-of-function studies is the biological pleiotropy resulting from multiple modes of action of siRNA reagents. A major confounding feature of these reagents is the microRNA-like translational quelling resulting from short regions of oligonucleotide complementarity to many different messenger RNAs. We developed a computational approach, deconvolution analysis of RNAi screening data, for automated quantitation of off-target effects in RNAi screening data sets. Substantial reduction of off-target rates was experimentally validated in five distinct biological screens across different genome-wide siRNA libraries. A public-access graphical-user-interface has been constructed to facilitate application of this algorithm.

Hereditary lung cancer syndrome targets never smokers with germline EGFR gene T790M mutations.

Gazdar A, Robinson L, Oliver D, Xing C, Travis WD, Soh J, Toyooka S, Watumull L, Xie Y, Kernstine K, Schiller JH.
April 2014 Journal of Thoracic Oncology Volume 9, Issue 4, Pages 456-463
image

Abstract

Introduction

Hereditary lung cancer syndromes are rare, and T790M germline mutations of the epidermal growth factor receptor (EGFR) gene predispose to the development of lung cancer. The goal of this study was to determine the clinical features and smoking status of lung cancer cases and unaffected family members with this germline mutation and to estimate its incidence and penetrance.

Methods

We studied a family with germline T790M mutations over five generations (14 individuals) and combined our observations with data obtained from a literature search (15 individuals).

Results

T790M germline mutations occurred in approximately 1% of non-small-cell lung cancer cases and in less than one in 7500 subjects without lung cancer. Both sporadic and germline T790M mutations were predominantly adenocarcinomas, favored female gender, and were occasionally multifocal. Of lung cancer tumors arising in T790M germline mutation carriers, 73% contained a second activating EGFR gene mutation. Inheritance was dominant. The odds ratio that T790M germline carriers who are smokers will develop lung cancer compared with never smoker carriers was 0.31 (p = 6.0E-05). There was an overrepresentation of never smokers with lung cancer with this mutation compared with the general lung cancer population (p = 7.4E-06).

Conclusion

Germline T790M mutations result in a unique hereditary lung cancer syndrome that targets never smokers, with a preliminary estimate of 31% risk for lung cancer in never smoker carriers, and this risk may be lower for heavy smokers. The resultant cancers share several features and differences with lung cancers containing sporadic EGFR mutations.

A model-based approach to identify binding sites in CLIP-Seq data.

Wang T, Chen B, Kim M, Xie Y, Xiao G.
April 2014 PLoS One. 9(4):e93248. doi: 10.1371/journal.pone.0093248.

Abstract

Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq) has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Here we present a novel model-based approach (MiClip) to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets. In the HITS-CLIP dataset, the signal/noise ratios of miRNA seed motif enrichment produced by the MiClip approach are between 17% and 301% higher than those by the ad hoc method for the top 10 most enriched miRNAs. In the PAR-CLIP dataset, the MiClip approach can identify ∼50% more validated binding targets than the original ad hoc method and two recently published methods. To facilitate the application of the algorithm, we have released an R package, MiClip (http://cran.r-project.org/web/packages/MiClip/index.html), and a public web-based graphical user interface software (http://galaxy.qbrc.org/tool_runner?tool_id=mi_clip) for customized analysis.

Detection of candidate tumor driver genes using a fully integrated Bayesian approach.

Yang J, Wang X, Kim M, Xie Y, Xiao G.
May 2014 Stat Med. 33(10):1784-800. doi: 10.1002/sim.6066.

Abstract

DNA copy number alterations (CNAs), including amplifications and deletions, can result in significant changes in gene expression and are closely related to the development and progression of many diseases, especially cancer. For example, CNA-associated expression changes in certain genes (called candidate tumor driver genes) can alter the expression levels of many downstream genes through transcription regulation and cause cancer. Identification of such candidate tumor driver genes leads to discovery of novel therapeutic targets for personalized treatment of cancers. Several approaches have been developed for this purpose by using both copy number and gene expression data. In this study, we propose a Bayesian approach to identify candidate tumor driver genes, in which the copy number and gene expression data are modeled together, and the dependency between the two data types is modeled through conditional probabilities. The proposed joint modeling approach can identify CNA and differentially expressed genes simultaneously, leading to improved detection of candidate tumor driver genes and comprehensive understanding of underlying biological processes. We evaluated the proposed method in simulation studies, and then applied to a head and neck squamous cell carcinoma data set. Both simulation studies and data application show that the joint modeling approach can significantly improve the performance in identifying candidate tumor driver genes, when compared with other existing approaches.

Adaptive prediction model in prospective molecular signature-based clinical studies.

Xiao G, Ma S, Minna J, Xie Y,
Febrery 2014 Clin Cancer Res. 20(3):531-9. doi: 10.1158/1078-0432.CCR-13-2127.
image

Abstract

Use of molecular profiles and clinical information can help predict which treatment would give the best outcome and survival for each individual patient, and thus guide optimal therapy, which offers great promise for the future of clinical trials and practice. High prediction accuracy is essential for selecting the best treatment plan. The gold standard for evaluating the prediction models is prospective clinical studies, in which patients are enrolled sequentially. However, there is no statistical method using this sequential feature to adapt the prediction model to the current patient cohort. In this article, we propose a reweighted random forest (RWRF) model, which updates the weight of each decision tree whenever additional patient information is available, to account for the potential heterogeneity between training and testing data. A simulation study and a lung cancer example are used to show that the proposed method can adapt the prediction model to current patients' characteristics, and, therefore, can improve prediction accuracy significantly. We also show that the proposed method can identify important and consistent predictive variables. Compared with rebuilding the prediction model, the RWRF updates a well-tested model gradually, and all of the adaptive procedure/parameters used in the RWRF model are prespecified before patient recruitment, which are important practical advantages for prospective clinical studies.

27-Hydroxycholesterol promotes cell-autonomous, ER-positive breast cancer growth.

Potts MB, Kim HS, Fisher KW, Hu Y, Carrasco YP, Bulut GB, Ou YH, Herrera-Herrera ML, Cubillos F, Mendiratta S, Xiao G, Hofree M, Ideker T, Xie Y, Huang LJ, Lewis RE, MacMillan JB, White MA.
November 2013 Cell Reports Volume 5, Issue 3, Pages 637-645

Abstract

To date, estrogen is the only known endogenous estrogen receptor (ER) ligand that promotes ER+ breast tumor growth. We report that the cholesterol metabolite 27-hydroxycholesterol (27HC) stimulates MCF-7 cell xenograft growth in mice. More importantly, in ER+ breast cancer patients, 27HC content in normal breast tissue is increased compared to that in cancer-free controls, and tumor 27HC content is further elevated. Increased tumor 27HC is correlated with diminished expression of CYP7B1, the 27HC metabolizing enzyme, and reduced expression of CYP7B1 in tumors is associated with poorer patient survival. Moreover, 27HC is produced by MCF-7 cells, and it stimulates cell-autonomous, ER-dependent, and GDNF-RET-dependent cell proliferation. Thus, 27HC is a locally modulated, nonaromatized ER ligand that promotes ER+ breast tumor growth.

Using functional signature ontology (FUSION) to identify mechanisms of action for natural products.

Potts MB, Kim HS, Fisher KW, Hu Y, Carrasco YP, Bulut GB, Ou YH, Herrera-Herrera ML, Cubillos F, Mendiratta S, Xiao G, Hofree M, Ideker T, Xie Y, Huang LJ, Lewis RE, MacMillan JB, White MA.
October 2013 Science Signal, Vol. 6, Issue 297, pp. ra90

Abstract

A challenge for biomedical research is the development of pharmaceuticals that appropriately target disease mechanisms. Natural products can be a rich source of bioactive chemicals for medicinal applications but can act through unknown mechanisms and can be difficult to produce or obtain. To address these challenges, we developed a new marine-derived, renewable natural products resource and a method for linking bioactive derivatives of this library to the proteins and biological processes that they target in cells. We used cell-based screening and computational analysis to match gene expression signatures produced by natural products to those produced by small interfering RNA (siRNA) and synthetic microRNA (miRNA) libraries. With this strategy, we matched proteins and miRNAs with diverse biological processes and also identified putative protein targets and mechanisms of action for several previously undescribed marine-derived natural products. We confirmed mechanistic relationships for selected siRNAs, miRNAs, and compounds with functional roles in autophagy, chemotaxis mediated by discoidin domain receptor 2, or activation of the kinase AKT. Thus, this approach may be an effective method for screening new drugs while simultaneously identifying their targets.

Cytoplasmic TRADD confers a worse prognosis in glioblastoma.

Chakraborty S, Li L, Tang H, Xie Y, Puliyappadamba VT, Raisanen J, Burma S, Boothman DA, Cochran B, Wu J, Habib AA.
August 2013 Neoplasia. 15(8):888-97

Abstract

Tumor necrosis factor receptor 1 (TNFR1)-associated death domain protein (TRADD) is an important adaptor in TNFR1 signaling and has an essential role in nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) activation and survival signaling. Increased expression of TRADD is sufficient to activate NF-κB. Recent studies have highlighted the importance of NF-κB activation as a key pathogenic mechanism in glioblastoma multiforme (GBM), the most common primary malignant brain tumor in adults.We examined the expression of TRADD by immunohistochemistry (IHC) and find that TRADD is commonly expressed at high levels in GBM and is detected in both cytoplasmic and nuclear distribution. Cytoplasmic IHC TRADD scoring is significantly associated with worse progression-free survival (PFS) both in univariate and multivariate analysis but is not associated with overall survival (n = 43 GBMs). PFS is a marker for responsiveness to treatment. We propose that TRADD-mediated NF-κB activation confers chemoresistance and thus a worse PFS in GBM. Consistent with the effect on PFS, silencing TRADD in glioma cells results in decreased NF-κB activity, decreased proliferation of cells, and increased sensitivity to temozolomide. TRADD expression is common in glioma-initiating cells. Importantly, silencing TRADD in GBM-initiating stem cell cultures results in decreased viability of stem cells, suggesting that TRADD may be required for maintenance of GBM stem cell populations. Thus, our study suggests that increased expression of cytoplasmic TRADD is both an important biomarker and a key driver of NF-κB activation in GBM and supports an oncogenic role for TRADD in GBM.

Influence of medical comorbidities on the presentation and outcomes of stage I-III non-small-cell lung cancer.

Ahn DH, Mehta N, Yorio JT, Xie Y, Yan J, Gerber DE.
November 2013 Clinical Lung Cancer Volume 14, Issue 6, Pages 644-650

Abstract

Background

Non-small-cell lung cancer presentation, treatment, and outcomes vary widely according to socioeconomic factors and other patient characteristics. To determine whether medical comorbidities account for these observations, we incorporated a validated medical comorbidity index into an analysis of patients diagnosed with stage I to III NSCLC.

Patients and Methods

We performed a retrospective analysis of consecutive patients diagnosed with stage I to III NSCLC. Demographic, tumor, and comorbidity data were obtained from hospital tumor registries and individual patient records. The association between variables was assessed using multivariate logistic regression and survival analysis.

Results

A total of 454 patients met criteria for analysis. The median age was 65 years, and 51% were men. Individuals with a higher Charlson Comorbidity Index (CCI) were significantly more likely to present with early stage (stage I-II) NSCLC than were patients with lower CCI (odds ratio, 1.72; 95% confidence interval, 1.14-2.63; P = .01), although this association lost statistical significance (P = .21) in a multivariate model. In multivariate logistic regression, overall survival remained associated with all variables: age, sex, race, insurance type, stage, histology, and CCI (P = .0007). The CCI was associated with survival for patients with early stage (P = .02) and locally advanced (P = .02) disease.

Conclusion

In this cohort of patients with stage I to III NSCLC, increasing comorbidity burden had a nonsignificant association with diagnosis at earlier disease stage. Although comorbidity burden was significantly associated with outcome for early stage and locally advanced disease, it did not account for survival differences based on multiple other patient and disease characteristics.

SbacHTS: spatial background noise correction for high-throughput RNAi screening.

Zhong R, Kim MS, White MA, Xie Y, Xiao G.
September 2013 Bioinformatics, Volume 29, Issue 17, Pages 2218–2220,
image

Abstract

Motivation

High-throughput cell-based phenotypic screening has become an increasingly important technology for discovering new drug targets and assigning gene functions. Such experiments use hundreds of 96-well or 384-well plates, to cover whole-genome RNAi collections and/or chemical compound files, and often collect measurements that are sensitive to spatial background noise whose patterns can vary across individual plates. Correcting these position effects can substantially improve measurement accuracy and screening success.

Result

We developed SbacHTS (Spatial background noise correction for High-Throughput RNAi Screening) software for visualization, estimation and correction of spatial background noise in high-throughput RNAi screens. SbacHTS is supported on the Galaxy open-source framework with a user-friendly open access web interface. We find that SbacHTS software can effectively detect and correct spatial background noise, increase signal to noise ratio and enhance statistical detection power in high-throughput RNAi screening experiments.

Availability
http://www.galaxy.qbrc.org/

Distinct transcriptome profiles identified in normal human bronchial epithelial cells after exposure to γ-rays and different elemental particles of high Z and energy.

Ding LH, Park S, Peyton M, Girard L, Xie Y, Minna JD, Story MD.
June 2013 BMC Genomics, 1;14:372. doi: 10.1186/1471-2164-14-372.

Abstract

Background

Ionizing radiation composed of accelerated ions of high atomic number (Z) and energy (HZE) deposits energy and creates damage in cells in a discrete manner as compared to the random deposition of energy and damage seen with low energy radiations such as γ- or x-rays. Such radiations can be highly effective at cell killing, transformation, and oncogenesis, all of which are concerns for the manned space program and for the burgeoning field of HZE particle radiotherapy for cancer. Furthermore, there are differences in the extent to which cells or tissues respond to such exposures that may be unrelated to absorbed dose. Therefore, we asked whether the energy deposition patterns produced by different radiation types would cause different molecular responses. We performed transcriptome profiling using human bronchial epithelial cells (HBECs) after exposure to γ-rays and to two different HZE particles (28Si and 56Fe) with different energy transfer properties to characterize the molecular response to HZE particles and γ-rays as a function of dose, energy deposition pattern, and time post-irradiation.

Results

Clonogenic assay indicated that the relative biological effectiveness (RBE) for 56Fe was 3.91 and for 28Si was 1.38 at 34% cell survival. Unsupervised clustering analysis of gene expression segregated samples according to the radiation species followed by the time after irradiation, whereas dose was not a significant parameter for segregation of radiation response. While a subset of genes associated with p53-signaling, such as CDKN1A, TRIM22 and BTG2 showed very similar responses to all radiation qualities, distinct expression changes were associated with the different radiation species. Gene enrichment analysis categorized the differentially expressed genes into functional groups related to cell death and cell cycle regulation for all radiation types, while gene pathway analysis revealed that the pro-inflammatory Acute Phase Response Signaling was specifically induced after HZE particle irradiation. A 73 gene signature capable of predicting with 96% accuracy the radiation species to which cells were exposed, was developed.

Conclusion

These data suggest that the molecular response to the radiation species used here is a function of the energy deposition characteristics of the radiation species. This novel molecular response to HZE particles may have implications for radiotherapy including particle selection for therapy and risk for second cancers, risk for cancers from diagnostic radiation exposures, as well as NASA's efforts to develop more accurate lung cancer risk estimates for astronaut safety. Lastly, irrespective of the source of radiation, the gene expression changes observed set the stage for functional studies of initiation or progression of radiation-induced lung carcinogenesis.

Detection of epigenetic changes using ANOVA with spatially varying coefficients.

Guanghua X, Xinlei W, Quincey L, Nestler EJ, Xie Y,
March 2013 Journal of Thoracic Oncology Volume 9, Issue 4, Pages 456-463

Abstract

Identification of genome-wide epigenetic changes, the stable changes in gene function without a change in DNA sequence, under various conditions plays an important role in biomedical research. High-throughput epigenetic experiments are useful tools to measure genome-wide epigenetic changes, but the measured intensity levels from these high-resolution genome-wide epigenetic profiling data are often spatially correlated with high noise levels. In addition, it is challenging to detect genome-wide epigenetic changes across multiple conditions, so efficient statistical methodology development is needed for this purpose. In this study, we consider ANOVA models with spatially varying coefficients, combined with a hierarchical Bayesian approach, to explicitly model spatial correlation caused by location-dependent biological effects (i.e., epigenetic changes) and borrow strength among neighboring probes to compare epigenetic changes across multiple conditions. Through simulation studies and applications in drug addiction and depression datasets, we find that our approach compares favorably with competing methods; it is more efficient in estimation and more effective in detecting epigenetic changes. In addition, it can provide biologically meaningful results.

Human lung epithelial cells progressed to malignancy through specific oncogenic manipulations.

Sato M, Larsen JE, Lee W, Sun H, Shames DS, Dalvi MP, Ramirez RD, Tang H, DiMaio JM, Gao B, Xie Y, Wistuba II, Gazdar AF, Shay JW, Minna JD.
June 2013 Mol Cancer Res. 11(6):638-50. doi: 10.1158/1541-7786.MCR-12-0634-T.
image

Abstract

We used CDK4/hTERT-immortalized normal human bronchial epithelial cells (HBEC) from several individuals to study lung cancer pathogenesis by introducing combinations of common lung cancer oncogenic changes (p53, KRAS, and MYC) and followed the stepwise transformation of HBECs to full malignancy. This model showed that: (i) the combination of five genetic alterations (CDK4, hTERT, sh-p53, KRAS(V12), and c-MYC) is sufficient for full tumorigenic conversion of HBECs; (ii) genetically identical clones of transformed HBECs exhibit pronounced differences in tumor growth, histology, and differentiation; (iii) HBECs from different individuals vary in their sensitivity to transformation by these oncogenic manipulations; (iv) high levels of KRAS(V12) are required for full malignant transformation of HBECs, however, prior loss of p53 function is required to prevent oncogene-induced senescence; (v) overexpression of c-MYC greatly enhances malignancy but only in the context of sh-p53+KRAS(V12); (vi) growth of parental HBECs in serum-containing medium induces differentiation, whereas growth of oncogenically manipulated HBECs in serum increases in vivo tumorigenicity, decreases tumor latency, produces more undifferentiated tumors, and induces epithelial-to-mesenchymal transition (EMT); (vii) oncogenic transformation of HBECs leads to increased sensitivity to standard chemotherapy doublets; (viii) an mRNA signature derived by comparing tumorigenic versus nontumorigenic clones was predictive of outcome in patients with lung cancer. Collectively, our findings show that this HBEC model system can be used to study the effect of oncogenic mutations, their expression levels, and serum-derived environmental effects in malignant transformation, while also providing clinically translatable applications such as development of prognostic signatures and drug response phenotypes.

A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients.

Tang H, Xiao G, Behrens C, Schiller J, Allen J, Chow CW, Suraokar M, Corvalan A, Mao J, White MA, Wistuba II, Minna JD, Xie Y,
March 2014 Clin Cancer Res. 19(6):1577-86. doi: 10.1158/1078-0432.CCR-12-2321.
image

Abstract

Purpose

Prospectively identifying who will benefit from adjuvant chemotherapy (ACT) would improve clinical decisions for non-small cell lung cancer (NSCLC) patients. In this study, we aim to develop and validate a functional gene set that predicts the clinical benefits of ACT in NSCLC.

Experimental Design

An 18-hub-gene prognosis signature was developed through a systems biology approach, and its prognostic value was evaluated in six independent cohorts. The 18-hub-gene set was then integrated with genome-wide functional (RNAi) data and genetic aberration data to derive a 12-gene predictive signature for ACT benefits in NSCLC.

Results

Using a cohort of 442 stage I to III NSCLC patients who underwent surgical resection, we identified an 18-hub-gene set that robustly predicted the prognosis of patients with adenocarcinoma in all validation datasets across four microarray platforms. The hub genes, identified through a purely data-driven approach, have significant biological implications in tumor pathogenesis, including NKX2-1, Aurora Kinase A, PRC1, CDKN3, MBIP, and RRM2. The 12-gene predictive signature was successfully validated in two independent datasets (n = 90 and 176). The predicted benefit group showed significant improvement in survival after ACT (UT Lung SPORE data: HR = 0.34, P = 0.017; JBR.10 clinical trial data: HR = 0.36, P = 0.038), whereas the predicted nonbenefit group showed no survival benefit for 2 datasets (HR = 0.80, P = 0.70; HR = 0.91, P = 0.82).

Conclusion

This is the first study to integrate genetic aberration, genome-wide RNAi data, and mRNA expression data to identify a functional gene set that predicts which resectable patients with non-small cell lung cancer will have a survival benefit with ACT.

Consent timing and experience: modifiable factors that may influence interest in clinical research.

Gerber DE, Rasco DW, Skinner CS, Dowell JE, Yan J, Sayne JR, Xie Y,
March 2012 J Oncol Pract.8(2):91-6. doi: 10.1200/JOP.2011.000335.

Abstract

Purpose

Low rates of participation in cancer clinical trials have been attributed to patient, institutional, and study characteristics. However, few studies have examined factors related to the consent process. We therefore evaluated the impact of consent timing and experience on markers of patient interest in research.

Methods

We performed a retrospective analysis of patients enrolled in a cancer center tissue repository. During enrollment, patients were asked if they were willing to be contacted in the future to provide medical follow-up information and/or to participate in other clinical research. We analyzed the association between patient responses to these questions and consent process factors using univariate analysis and multivariate logistic regression.

Results

Of 922 patients evaluated, 85% agreed to be contacted to provide follow-up information, and 83% agreed to be contacted to participate in future research studies. In univariate analysis, willingness to be contacted for future research was associated with consenter experience (P = .01) and had a trend toward association with the timing of enrollment in relation to diagnosis (P = .08), but it was not associated with patient sex, race, or diagnosis. In multivariate analysis, responses remained associated with consenter experience (P = .02).

Conclusion

Factors related to the consent process, including consenter experience and timing of study enrollment, are significantly associated with or have a trend toward association with markers of patient interest in clinical research. These understudied and potentially modifiable variables warrant further evaluation.

The starvation hormone, fibroblast growth factor-21, extends lifespan in mice.

Zhang Y, Xie Y, Berglund ED, Coate KC, He TT, Katafuchi T, Xiao G, Potthoff MJ, Wei W, Wan Y, Yu RT, Evans RM, Kliewer SA, Mangelsdorf DJ.
October 2012 Elife. 1:e00065. doi: 10.7554/eLife.00065.

Abstract

Fibroblast growth factor-21 (FGF21) is a hormone secreted by the liver during fasting that elicits diverse aspects of the adaptive starvation response. Among its effects, FGF21 induces hepatic fatty acid oxidation and ketogenesis, increases insulin sensitivity, blocks somatic growth and causes bone loss. Here we show that transgenic overexpression of FGF21 markedly extends lifespan in mice without reducing food intake or affecting markers of NAD+ metabolism or AMP kinase and mTOR signaling. Transcriptomic analysis suggests that FGF21 acts primarily by blunting the growth hormone/insulin-like growth factor-1 signaling pathway in liver. These findings raise the possibility that FGF21 can be used to extend lifespan in other species.DOI:http://dx.doi.org/10.7554/eLife.00065.001.

A multicenter phase II study of cisplatin, pemetrexed, and bevacizumab in patients with advanced malignant mesothelioma.

Dowell JE, Dunphy FR, Taub RN, Gerber DE, Ngov L, Yan J, Xie Y, Kindler HL.
September 2012 Lung Cancer Volume 77, Issue 3, Pages 567-571
image

Abstract

Introduction

Malignant mesothelioma (MM) cells express the vascular endothelial growth factor (VEGF) receptor, and VEGF protein expression is detected in a majority of human mesothelioma biopsy specimens. Bevacizumab is a recombinant humanized monoclonal antibody that blocks the binding of VEGF to its receptor. We evaluated the addition of bevacizumab to cisplatin and pemetrexed as first-line treatment in patients with advanced, unresectable MM.

Methods

Previously untreated MM patients with advanced, unresectable disease received cisplatin (75 mg/m(2)), pemetrexed (500 mg/m(2)), and bevacizumab (15 mg/kg) intravenously every 21 days for a maximum of 6 cycles. Patients with responsive or stable disease received bevacizumab (15 mg/kg) intravenously every 21 days until progression or intolerance. The primary endpoint was progression-free survival rate at 6 months.

Results

53 patients were enrolled at 4 centers; 52 were evaluable for this analysis. The progression-free survival rate at 6 months was 56% and the median progression-free survival was 6.9 months (95% confidence interval [CI], 5.3-7.8 months). The partial response rate was 40% and 35% of patients had stable disease. Median overall survival was 14.8 months (95% CI; 10.0-17.0 months). Grade 3/4 toxicities included neutropenia in 11%, hypertension in 6%, and venous thromboembolism in 13% of patients.

Conclusion

This trial evaluating the addition of bevacizumab to cisplatin and pemetrexed in patients with previously untreated, advanced MM failed to meet the primary endpoint of a 33% improvement in the progression-free survival rate at 6 months compared with historical controls treated with cisplatin and pemetrexed alone.

Socioeconomic disparities in lung cancer treatment and outcomes persist within a single academic medical center.

Yorio JT, Yan J, Xie Y, Gerber DE.
November 2012 Clinical Lung Cancer Volume 13, Issue 6, Pages 448-457

Abstract

Background

Socioeconomic disparities in treatment and outcomes of non-small-cell lung cancer (NSCLC) are well established. To explore whether these differences are secondary to individual or institutional characteristics, we examined treatment selection and outcome in a diverse population treated at a single medical center.

Patients and Methods

We performed a retrospective analysis of consecutive patients diagnosed with NSCLC stages I-III from 2000 to 2005 at the University of Texas Southwestern Medical Center. Treatment selection was dichotomized as 'standard' (surgery for stage I-II; surgery and/or radiation therapy for stage III) or 'other.' Associations between patient characteristics (including socioeconomic status) and treatment selection were examined using logistic regression; associations between characteristics and overall survival were examined using Cox regression models and Kaplan-Meier survival analysis.

Results

A total of 450 patients were included. Twenty-eight percent of patients had private insurance, 43% had Medicare, and 29% had an indigent care plan. The likelihood of receiving 'standard' therapy was significantly associated with insurance type (indigent plan versus private insurance odds ratio [OR] 0.13, 95% confidence interval [CI] 0.04, 0.43 for stage I-II; OR 0.38, 95% CI 0.14, 1.00 for stage III). For patients with stage I-II NSCLC, survival was associated with age, sex, insurance type (indigent plan versus private insurance hazard ratio for death 1.98; 95% CI 1.16, 3.37), stage, and treatment selection. In stage III NSCLC, survival was associated with treatment selection.

Conclusion

ithin a single academic medical center, socioeconomically disadvantaged patients with stage I-III NSCLC are less likely to receive 'standard' therapy. Socioeconomically disadvantaged patients with stage I-II NSCLC have inferior survival independent of therapy.

Cell-free formation of RNA granules: bound RNAs identify features and components of cellular assemblies.

Han TW, Kato M, Xie S, Wu LC, Mirzaei H, Pei J, Chen M, Xie Y, Allen J, Xiao G, McKnight SL.
May 2012 Cell Volume 149, Issue 4, Pages 768-779
image

Abstract

Cellular granules lacking boundary membranes harbor RNAs and their associated proteins and play diverse roles controlling the timing and location of protein synthesis. Formation of such granules was emulated by treatment of mouse brain extracts and human cell lysates with a biotinylated isoxazole (b-isox) chemical. Deep sequencing of the associated RNAs revealed an enrichment for mRNAs known to be recruited to neuronal granules used for dendritic transport and localized translation at synapses. Precipitated mRNAs contain extended 3' UTR sequences and an enrichment in binding sites for known granule-associated proteins. Hydrogels composed of the low complexity (LC) sequence domain of FUS recruited and retained the same mRNAs as were selectively precipitated by the b-isox chemical. Phosphorylation of the LC domain of FUS prevented hydrogel retention, offering a conceptual means of dynamic, signal-dependent control of RNA granule assembly.

A lung cancer molecular prognostic test ready for prime time.

Xie Y, Minna JD.
March 2012 The Lancet, Volume 379, Issue 9819, Pages 785-787

Development of methods for quantitative comparison of pooled shRNAs by mass sequencing.

Hoshiyama H, Tang J, Batten K, Xiao G, Rouillard JM, Shay JW, Xie Y, Wright WE.
February 2012 J Biomol Screen. 17(2):258-65. doi: 10.1177/1087057111423101.

Abstract

Pooled short-hairpin RNA (shRNA) library screening is a powerful tool for identifying a set of genes in biological pathways that require stable expression to produce a desired phenotype. Massive parallel sequencing of half-hairpins has proven highly variable and has not given satisfactory results concerning the relative abundance of different shRNAs before and after selection. Here, the authors describe a method for quantitative comparison of half-hairpins from pooled shRNAs in the mir30-based pGIPZ vector that is analyzed by massive parallel sequencing. Introducing a multiplexing code and refining the sample preparation scheme resulted in the predicted ability to detect twofold enrichments. These improvements should permit half-hairpin sequencing to analyze either dropout screens or selective pooled shRNA screens of limited stringency to analyze phenotypes not accessible in transient experiments.

Incidence of unanticipated difficult airway in obstetric patients in a teaching institution.

Tao W, Edwards JT, Tu F,Xie Y, Sharma SK.
January 2012 Journal of Anesthesia, Volume 26, Issue 3, pp 339345

Abstract

Purpose

Our aim was to determine the incidence of difficult intubation during pregnancy-related surgery at a high-risk, high-volume teaching institution.

Methods

Airway experience was analyzed among patients who had pregnancy-related surgery under general anesthesia from January 2001 through February 2006. A difficult airway was defined as needing three or more direct laryngoscopy (DL) attempts, use of the additional airway equipment after the DL attempts, or conversion to regional anesthesia due to inability to intubate. Airway characteristics were compared between patients with and without a difficult airway. In addition, pre- and postoperative airway evaluations were compared to identify factors closely related to changes from pregnancy.

Results

In a total of 30,766 operations, 2,158 (7%) were performed with general anesthesia. Among these, 1,026 (47.5%) were for emergency cesarean delivery (CD), 610 (28.3%) for nonemergency CD, and 522 (24.2%) for non-CD procedures. A total of 12 patients (0.56%) were identified as having a difficult airway. Four patients were intubated with further DL attempts; others required mask ventilation and other airway equipment. Two patients were ventilated through a laryngeal mask airway without further intubation attempts. Ten of the 12 difficult airway cases were encountered by residents during their first year of clinical anesthesia training. There were no maternal or fetal complications except one possible aspiration.

Conclusion

Unanticipated difficult airways accounted for 0.56% of all pregnancy-related surgical patients. More than 99.9% of all obstetric patients could be intubated. A difficult airway is more likely to be encountered by anesthesia providers with <1 year of experience. Proper use of airway equipment may help secure the obstetric airway or provide adequate ventilation. Emergency CD did not add an additional level of difficulty over nonemergency CD.

Comparing statistical methods for constructing large scale gene networks.

Allen JD, Xie Y, Chen M, Girard L, Xiao G.
January 2012 PLoS One. 7(1):e29348. doi: 10.1371/journal.pone.0029348.

Abstract

The gene regulatory network (GRN) reveals the regulatory relationships among genes and can provide a systematic understanding of molecular mechanisms underlying biological processes. The importance of computer simulations in understanding cellular processes is now widely accepted; a variety of algorithms have been developed to study these biological networks. The goal of this study is to provide a comprehensive evaluation and a practical guide to aid in choosing statistical methods for constructing large scale GRNs. Using both simulation studies and a real application in E. coli data, we compare different methods in terms of sensitivity and specificity in identifying the true connections and the hub genes, the ease of use, and computational speed. Our results show that these algorithms performed reasonably well, and each method has its own advantages: (1) GeneNet, WGCNA (Weighted Correlation Network Analysis), and ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) performed well in constructing the global network structure; (2) GeneNet and SPACE (Sparse PArtial Correlation Estimation) performed well in identifying a few connections with high specificity.

Probe mapping across multiple microarray platforms.

Allen JD, Wang S, Chen M, Girard L, Minna JD, Xie Y, Xiao G.
September 2012 Briefings in Bioinformatics, Volume 13, Issue 5, Pages 547554,
image

Abstract

Access to gene expression data has become increasingly common in recent years; however, analysis has become more difficult as it is often desirable to integrate data from different platforms. Probe mapping across microarray platforms is the first and most crucial step for data integration. In this article, we systematically review and compare different approaches to map probes across seven platforms from different vendors: U95A, U133A and U133 Plus 2.0 from Affymetrix, Inc.; HT-12 v1, HT-12v2 and HT-12v3 from Illumina, Inc.; and 4112A from Agilent, Inc. We use a unique data set, which contains 56 lung cancer cell line samples-each of which has been measured by two different microarray platforms-to evaluate the consistency of expression measurement across platforms using different approaches. Based on the evaluation from the empirical data set, the BLAST alignment of the probe sequences to a recent revision of the Transcriptome generated better results than using annotations provided by Vendors or from Bioconductor's Annotate package. However, a combination of all three methods (deemed the 'Consensus Annotation') yielded the most consistent expression measurement across platforms. To facilitate data integration across microarray platforms for the research community, we develop a user-friendly web-based tool, an API and an R package to map data across different microarray platforms from Affymetrix, Illumina and Agilent. Information on all three can be found at http://qbrc.swmed.edu/software/probemapper/.

SMAC mimetic (JP1201) sensitizes non-small cell lung cancers to multiple chemotherapy agents in an IAP-dependent but TNF-α-independent manner.

Greer RM, Peyton M, Larsen JE, Girard L,Xie Y, Gazdar AF, Harran P, Wang L, Brekken RA, Wang X, Minna JD.
December 2011 Cancer Research, Volume 71, Issue 24

Abstract

Inhibitors of apoptosis proteins (IAP) are key regulators of apoptosis and are inhibited by the second mitocondrial activator of caspases (SMAC). Previously, a small subset of TNF-α-expressing non-small cell lung cancers (NSCLC) was found to be sensitive to SMAC mimetics alone. In this study, we determined if a SMAC mimetic (JP1201) could sensitize nonresponsive NSCLC cell lines to standard chemotherapy. We found that JP1201 sensitized NSCLCs to doxorubicin, erlotinib, gemcitabine, paclitaxel, vinorelbine, and the combination of carboplatin with paclitaxel in a synergistic manner at clinically achievable drug concentrations. Sensitization did not occur with platinum alone. Furthermore, sensitization was specific for tumor compared with normal lung epithelial cells, increased in NSCLCs harvested after chemotherapy treatment, and did not induce TNF-α secretion. Sensitization also was enhanced in vivo with increased tumor inhibition and increased survival of mice carrying xenografts. These effects were accompanied by caspase 3, 4, and 9 activation, indicating that both mitochondrial and endoplasmic reticulum stress-induced apoptotic pathways are activated by the combination of vinorelbine and JP1201. Chemotherapies that induce cell death through the mitochondrial pathway required only inhibition of X-linked IAP (XIAP) for sensitization, whereas chemotherapies that induce cell death through multiple apoptotic pathways required inhibition of cIAP1, cIAP2, and XIAP. Therefore, the data suggest that IAP-targeted therapy using a SMAC mimetic provides a new therapeutic strategy for synergistic sensitization of NSCLCs to standard chemotherapy agents, which seems to occur independently of TNF-α secretion.

Image-based genome-wide siRNA screen identifies selective autophagy factors.

Orvedahl A, Sumpter R Jr, Xiao G, Ng A, Zou Z, Tang Y, Narimatsu M, Gilpin C, Sun Q, Roth M, Forst CV, Wrana JL, Zhang YE, Luby-Phelps K, Xavier RJ, Xie Y, Levine B.
December 2011 Nature 480, 113117

Abstract

Selective autophagy involves the recognition and targeting of specific cargo, such as damaged organelles, misfolded proteins, or invading pathogens for lysosomal destruction. Yeast genetic screens have identified proteins required for different forms of selective autophagy, including cytoplasm-to-vacuole targeting, pexophagy and mitophagy, and mammalian genetic screens have identified proteins required for autophagy regulation. However, there have been no systematic approaches to identify molecular determinants of selective autophagy in mammalian cells. Here, to identify mammalian genes required for selective autophagy, we performed a high-content, image-based, genome-wide small interfering RNA screen to detect genes required for the colocalization of Sindbis virus capsid protein with autophagolysosomes. We identified 141 candidate genes required for viral autophagy, which were enriched for cellular pathways related to messenger RNA processing, interferon signalling, vesicle trafficking, cytoskeletal motor function and metabolism. Ninety-six of these genes were also required for Parkin-mediated mitophagy, indicating that common molecular determinants may be involved in autophagic targeting of viral nucleocapsids and autophagic targeting of damaged mitochondria. Murine embryonic fibroblasts lacking one of these gene products, the C2-domain containing protein, SMURF1, are deficient in the autophagosomal targeting of Sindbis and herpes simplex viruses and in the clearance of damaged mitochondria. Moreover, SMURF1-deficient mice accumulate damaged mitochondria in the heart, brain and liver. Thus, our study identifies candidate determinants of selective autophagy, and defines SMURF1 as a newly recognized mediator of both viral autophagy and mitophagy.

Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients.

Xie Y, Xiao G, Coombes KR, Behrens C, Solis LM, Raso G, Girard L, Erickson HS, Roth J, Heymach JV, Moran C, Danenberg K, Minna JD, Wistuba II.
September 2011 Clinical Cancer Research, Volume 17, Issue 17
image

Abstract

Purpose

The requirement of frozen tissues for microarray experiments limits the clinical usage of genome-wide expression profiling by using microarray technology. The goal of this study is to test the feasibility of developing lung cancer prognosis gene signatures by using genome-wide expression profiling of formalin-fixed paraffin-embedded (FFPE) samples, which are widely available and provide a valuable rich source for studying the association of molecular changes in cancer and associated clinical outcomes.

Experimental Design

We randomly selected 100 Non-Small-Cell lung cancer (NSCLC) FFPE samples with annotated clinical information from the UT-Lung SPORE Tissue Bank. We microdissected tumor area from FFPE specimens and used Affymetrix U133 plus 2.0 arrays to attain gene expression data. After strict quality control and analysis procedures, a supervised principal component analysis was used to develop a robust prognosis signature for NSCLC. Three independent published microarray datasets were used to validate the prognosis model.

Results

This study showed that the robust gene signature derived from genome-wide expression profiling of FFPE samples is strongly associated with lung cancer clinical outcomes and can be used to refine the prognosis for stage I lung cancer patients, and the prognostic signature is independent of clinical variables. This signature was validated in several independent studies and was refined to a 59-gene lung cancer prognosis signature.

Conclusion

We conclude that genome-wide profiling of FFPE lung cancer samples can identify a set of genes whose expression level provides prognostic information across different platforms and studies, which will allow its application in clinical settings.

Knockdown of oncogenic KRAS in non-small cell lung cancers suppresses tumor growth and sensitizes tumor cells to targeted therapy.

Sunaga N, Shames DS, Girard L, Peyton M, Larsen JE, Imai H, Soh J, Sato M, Yanagitani N, Kaira K, Xie Y, Gazdar AF, Mori M, Minna JD.
February 2011 Molecular Cancer Therapeutics, Volume 10, Issue 2, 10(2):336-46.

Abstract

Oncogenic KRAS is found in more than 25% of lung adenocarcinomas, the major histologic subtype of non-small cell lung cancer (NSCLC), and is an important target for drug development. To this end, we generated four NSCLC lines with stable knockdown selective for oncogenic KRAS. As expected, stable knockdown of oncogenic KRAS led to inhibition of in vitro and in vivo tumor growth in the KRAS-mutant NSCLC cells, but not in NSCLC cells that have wild-type KRAS (but mutant NRAS). Surprisingly, we did not see large-scale induction of cell death and the growth inhibitory effect was not complete. To further understand the ability of NSCLCs to grow despite selective removal of mutant KRAS expression, we conducted microarray expression profiling of NSCLC cell lines with or without mutant KRAS knockdown and isogenic human bronchial epithelial cell lines with and without oncogenic KRAS. We found that although the mitogen-activated protein kinase pathway is significantly downregulated after mutant KRAS knockdown, these NSCLCs showed increased levels of phospho-STAT3 and phospho-epidermal growth factor receptor, and variable changes in phospho-Akt. In addition, mutant KRAS knockdown sensitized the NSCLCs to p38 and EGFR inhibitors. Our findings suggest that targeting oncogenic KRAS by itself will not be sufficient treatment, but may offer possibilities of combining anti-KRAS strategies with other targeted drugs.

Predictors and impact of second-line chemotherapy for advanced non-small cell lung cancer in the United States: real-world considerations for maintenance therapy.

Gerber DE, Rasco DW, Le P, Yan J, Dowell JE, Xie Y,
February 2011 Journal of Thoracic Oncology, Volume 6, Issue 2, Pages 365371

Abstract

Introduction

Recent clinical trials incorporating maintenance chemotherapy into the initial treatment of advanced non-small cell lung cancer (NSCLC) have highlighted the benefits of exposing patients to second-line therapies. We, therefore, determined the predictors and impact of second-line chemotherapy administration in a contemporary, diverse NSCLC population.

Methods

We performed a retrospective analysis of consecutive patients diagnosed with stage IV NSCLC from 2000 to 2007 at clinical facilities associated with the University of Texas Southwestern Medical Center. Demographic, disease, treatment, and outcome data were obtained from hospital tumor registries. The association between these variables was assessed using univariate analysis and multivariate logistic regression.

Results

A total of 406 patients in this cohort received first-line chemotherapy and were included in the analysis. Mean age was 59 years, 28% were women, and 59% were white. Among these patients, 197 (49%) received second-line chemotherapy. Among those patients who had not progressed after four to six cycles of first-line chemotherapy, 67% received second-line chemotherapy. Receipt of second-line chemotherapy was significantly associated with patient insurance type (p = 0.007), number of cycles of first-line chemotherapy (p < 0.001), and receipt of prechemotherapy palliative radiation therapy (p = 0.005) but was not associated with patient age, gender, race, histology, or year of diagnosis. In a multivariate model, second-line chemotherapy administration remained associated with insurance type (p = 0.003), number of cycles of first-line chemotherapy (p < 0.001), and receipt of prechemotherapy palliative radiation therapy (p = 0.008). The number of cycles of first-line chemotherapy and administration of second-line chemotherapy were associated with overall survival in both univariate and multivariate analyses.

Conclusion

In this unselected, contemporary, and diverse cohort of patients with advanced NSCLC, 67% of individuals whose disease had not progressed after four to six cycles of first-line chemotherapy eventually received second-line chemotherapy. Markers of socioeconomic status, symptom burden, and response to and tolerance of first-line chemotherapy were associated with receipt of second-line chemotherapy. These factors may assist in the selection of patients most likely to benefit from maintenance chemotherapy.

A novel approach to DNA copy number data segmentation.

Wang S, Wang Y, Xie Y, Xiao G.
February 2011 J Bioinform Comput Biol. 9(1): 131–148.

Abstract

DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.

Nuclear receptor expression defines a set of prognostic biomarkers for lung cancer.

Jeong Y, Xie Y, Xiao G, Behrens C, Girard L, Wistuba II, Minna JD, Mangelsdorf DJ.
December 2010 PLoS Med. 7(12):e1000378. doi: 10.1371/journal.pmed.1000378.
image

Abstract

Background

The identification of prognostic tumor biomarkers that also would have potential as therapeutic targets, particularly in patients with early stage disease, has been a long sought-after goal in the management and treatment of lung cancer. The nuclear receptor (NR) superfamily, which is composed of 48 transcription factors that govern complex physiologic and pathophysiologic processes, could represent a unique subset of these biomarkers. In fact, many members of this family are the targets of already identified selective receptor modulators, providing a direct link between individual tumor NR quantitation and selection of therapy. The goal of this study, which begins this overall strategy, was to investigate the association between mRNA expression of the NR superfamily and the clinical outcome for patients with lung cancer, and to test whether a tumor NR gene signature provided useful information (over available clinical data) for patients with lung cancer.

Methods and Findings

Using quantitative real-time PCR to study NR expression in 30 microdissected non-small-cell lung cancers (NSCLCs) and their pair-matched normal lung epithelium, we found great variability in NR expression among patients' tumor and non-involved lung epithelium, found a strong association between NR expression and clinical outcome, and identified an NR gene signature from both normal and tumor tissues that predicted patient survival time and disease recurrence. The NR signature derived from the initial 30 NSCLC samples was validated in two independent microarray datasets derived from 442 and 117 resected lung adenocarcinomas. The NR gene signature was also validated in 130 squamous cell carcinomas. The prognostic signature in tumors could be distilled to expression of two NRs, short heterodimer partner and progesterone receptor, as single gene predictors of NSCLC patient survival time, including for patients with stage I disease. Of equal interest, the studies of microdissected histologically normal epithelium and matched tumors identified expression in normal (but not tumor) epithelium of NGFIB3 and mineralocorticoid receptor as single gene predictors of good prognosis.

Conclusion

NR expression is strongly associated with clinical outcomes for patients with lung cancer, and this expression profile provides a unique prognostic signature for lung cancer patient survival time, particularly for those with early stage disease. This study highlights the potential use of NRs as a rational set of therapeutically tractable genes as theragnostic biomarkers, and specifically identifies short heterodimer partner and progesterone receptor in tumors, and NGFIB3 and MR in non-neoplastic lung epithelium, for future detailed translational study in lung cancer. Please see later in the article for the Editors' Summary.

Aldehyde dehydrogenase activity selects for lung adenocarcinoma stem cells dependent on notch signaling.

Sullivan JP, Spinola M, Dodge M, Raso MG, Behrens C, Gao B, Schuster K, Shao C, Larsen JE, Sullivan LA, Honorio S, Xie Y, Scaglioni PP, DiMaio JM, Gazdar AF, Shay JW, Wistuba II, Minna JD.
December 2010 Cancer Res. 70(23):9937-48. doi: 10.1158/0008-5472.CAN-10-0881.

Abstract

Aldehyde dehydrogenase (ALDH) is a candidate marker for lung cancer cells with stem cell-like properties. Immunohistochemical staining of a large panel of primary non-small cell lung cancer (NSCLC) samples for ALDH1A1, ALDH3A1, and CD133 revealed a significant correlation between ALDH1A1 (but not ALDH3A1 or CD133) expression and poor prognosis in patients including those with stage I and N0 disease. Flow cytometric analysis of a panel of lung cancer cell lines and patient tumors revealed that most NSCLCs contain a subpopulation of cells with elevated ALDH activity, and that this activity is associated with ALDH1A1 expression. Isolated ALDH(+) lung cancer cells were observed to be highly tumorigenic and clonogenic as well as capable of self-renewal compared with their ALDH(-) counterparts. Expression analysis of sorted cells revealed elevated Notch pathway transcript expression in ALDH(+) cells. Suppression of the Notch pathway by treatment with either a γ-secretase inhibitor or stable expression of shRNA against NOTCH3 resulted in a significant decrease in ALDH(+) lung cancer cells, commensurate with a reduction in tumor cell proliferation and clonogenicity. Taken together, these findings indicate that ALDH selects for a subpopulation of self-renewing NSCLC stem-like cells with increased tumorigenic potential, that NSCLCs harboring tumor cells with ALDH1A1 expression have inferior prognosis, and that ALDH1A1 and CD133 identify different tumor subpopulations. Therapeutic targeting of the Notch pathway reduces this ALDH(+) component, implicating Notch signaling in lung cancer stem cell maintenance.

Steroid receptor coactivator-3 expression in lung cancer and its role in the regulation of cancer cell survival and proliferation.

Cai D, Shames DS, Raso MG, Xie Y, Kim YH, Pollack JR, Girard L, Sullivan JP, Gao B, Peyton M, Nanjundan M, Byers L, Heymach J, Mills G, Gazdar AF, Wistuba I, Kodadek T, Minna JD.
August 2010 Cancer Res. 70(16):6477-85. doi: 10.1158/0008-5472.CAN-10-0005.

Abstract

Steroid receptor coactivator-3 (SRC-3) is a histone acetyltransferase and nuclear hormone receptor coactivator, located on 20q12, which is amplified in several epithelial cancers and well studied in breast cancer. However, its possible role in lung cancer pathogenesis is unknown. We found SRC-3 to be overexpressed in 27% of non-small cell lung cancer (NSCLC) patients (n = 311) by immunohistochemistry, which correlated with poor disease-free (P = 0.0015) and overall (P = 0.0008) survival. Twenty-seven percent of NSCLCs exhibited SRC-3 gene amplification, and we found that lung cancer cell lines expressed higher levels of SRC-3 than did immortalized human bronchial epithelial cells (HBEC), which in turn expressed higher levels of SRC-3 than did cultured primary human HBECs. Small interfering RNA-mediated downregulation of SRC-3 in high-expressing, but not in low-expressing, lung cancer cells significantly inhibited tumor cell growth and induced apoptosis. Finally, we found that SRC-3 expression is inversely correlated with gefitinib sensitivity and that SRC-3 knockdown results in epidermal growth factor receptor tyrosine kinase inhibitor-resistant lung cancers becoming more sensitive to gefitinib. Taken together, these data suggest that SRC-3 may be an important oncogene and therapeutic target for lung cancer.

Statistical methods for integrating multiple types of high-throughput data.

Xie Y, Ahn C.
2010 Statistical Methods in Molecular Biology pp 511-529
image
image

Abstract

Large-scale sequencing, copy number, mRNA, and protein data have given great promise to the biomedical research, while posing great challenges to data management and data analysis. Integrating different types of high-throughput data from diverse sources can increase the statistical power of data analysis and provide deeper biological understanding. This chapter uses two biomedical research examples to illustrate why there is an urgent need to develop reliable and robust methods for integrating the heterogeneous data. We then introduce and review some recently developed statistical methods for integrative analysis for both statistical inference and classification purposes. Finally, we present some useful public access databases and program code to facilitate the integrative analysis in practice.

Looking beyond surveillance, epidemiology, and end results: patterns of chemotherapy administration for advanced non-small cell lung cancer in a contemporary, diverse population.

Rasco DW, Yan J, Xie Y, Dowell JE, Gerber DE.
October 2010 Journal of Thoracic Oncology, Volume 5, Issue 10, Pages 15291535

Abstract

Introduction

Chemotherapy prolongs survival without substantially impairing quality of life for medically fit patients with advanced non-small cell lung cancer (NSCLC), but population-based studies have shown that only 20 to 30% of these patients receive chemotherapy. These earlier studies have relied on Medicare-linked Surveillance, Epidemiology, and End Results (SEER) data, thus excluding the 30 to 35% of lung cancer patients younger than 65 years. Therefore, we determined the use of chemotherapy in a contemporary, diverse NSCLC population encompassing all patient ages.

Methods

We performed a retrospective analysis of patients diagnosed with stage IV NSCLC from 2000 to 2007 at the University of Texas Southwestern Medical Center. Demographic, treatment, and outcome data were obtained from hospital tumor registries. The association between these variables was assessed using univariate analysis and multivariate logistic regression.

Results

In all, 718 patients met criteria for analysis. Mean age was 60 years, 58% were men, and 45% were white. Three hundred fifty-three patients (49%) received chemotherapy. In univariate analysis, receipt of chemotherapy was associated with age (53% of patients younger than 65 years versus 41% of patients aged 65 years and older; p = 0.003) and insurance type (p < 0.001). In a multivariate model, age and insurance type remained associated with receipt of chemotherapy. For individuals receiving chemotherapy, median survival was 9.2 months, compared with 2.3 months for untreated patients (p < 0.001).

Conclusion

In a contemporary population representing the full age range of patients with advanced NSCLC, chemotherapy was administered to approximately half of all patients-more than twice the rate reported in some earlier studies. Patient age and insurance type are associated with receipt of chemotherapy.

A Bayesian approach to joint modeling of protein-DNA binding, gene expression and sequence data.

Xie Y, Pan W, Jeong KS, Xiao G, Khodursky AB.
February 2010 Stat Med. 29(4):489-503.

Abstract

The genome-wide DNA-protein-binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment protein-DNA-binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental data set show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared with conventional approaches relying on a single data source.

Lack of host SPARC enhances vascular function and tumor spread in an orthotopic murine model of pancreatic carcinoma.

Arnold SA, Rivera LB, Miller AF, Carbon JG, Dineen SP, Xie Y, Castrillon DH, Sage EH, Puolakkainen P, Bradshaw AD, Brekken RA.
Jan-Feb 2012 Disease Models & Mechanisms,3(1-2):57-72.

Abstract

Utilizing subcutaneous tumor models, we previously validated SPARC (secreted protein acidic and rich in cysteine) as a key component of the stromal response, where it regulated tumor size, angiogenesis and extracellular matrix deposition. In the present study, we demonstrate that pancreatic tumors grown orthotopically in Sparc-null (Sparc(-/-)) mice are more metastatic than tumors grown in wild-type (Sparc(+/+)) littermates. Tumors grown in Sparc(-/-) mice display reduced deposition of fibrillar collagens I and III, basement membrane collagen IV and the collagen-associated proteoglycan decorin. In addition, microvessel density and pericyte recruitment are reduced in tumors grown in the absence of host SPARC. However, tumors from Sparc(-/-) mice display increased permeability and perfusion, and a subsequent decrease in hypoxia. Finally, we found that tumors grown in the absence of host SPARC exhibit an increase in alternatively activated macrophages. These results suggest that increased tumor burden in the absence of host SPARC is a consequence of reduced collagen deposition, a disrupted vascular basement membrane, enhanced vascular function and an immune-tolerant, pro-metastatic microenvironment.

Lung cancer diagnostic and treatment intervals in the United States: a health care disparity?

Yorio JT, Xie Y, Yan J, Gerber DE.
November 2009 Journal of Thoracic Oncology, Volume 4, Issue 11, Pages 13221330
image

Abstract

Introduction

Lung cancer diagnostic and treatment delays have been described for several patient populations. However, few studies have analyzed these intervals among patients treated in contemporary health care systems in the United States. We therefore studied the timing of lung cancer diagnosis and treatment at a U.S. medical center providing care to a diverse patient population within two different hospital systems.

Methods and Findings

Lung cancer diagnostic and treatment delays have been described for several patient populations. However, few studies have analyzed these intervals among patients treated in contemporary health care systems in the United States. We therefore studied the timing of lung cancer diagnosis and treatment at a U.S. medical center providing care to a diverse patient population within two different hospital systems.

Results

A total of 482 patients met criteria for analysis. In univariate analyses, the image-treatment interval was significantly associated with race, age, income, insurance type, and hospital type (76 days for public versus 45 days for private; p < 0.0001). In multivariate analysis, only hospital type remained significantly associated with the image-treatment interval; patients in the private hospital setting were more likely to receive timely treatment (hazard ratio 1.85; 95% confidence interval, 1.37-2.50; p < 0.001). In univariate analysis, the image-treatment interval was not associated with disease stage (p = 0.27) or with survival (p = 0.42).

Conclusion

Intervals between suspicion, diagnosis, and treatment of lung cancer vary widely among patients. Health care system factors, such as hospital type, largely account for these discrepancies. In this study, these intervals do not appear to be associated with clinical outcomes.

The impact of consenter characteristics and experience on patient interest in clinical research.

Rasco DW, Xie Y, Yan J, Sayne JR, Skinner CS, Dowell JE, Gerber DE.
May 2009 THE ONCOLOGIST, 14(5):468-75.
image

Abstract

Background

To explain the historically low rates of participation in cancer clinical trials, several factors have been studied. These include subject characteristics and attitudes, clinical trial availability and eligibility criteria, and physician attitudes and communication skills. However, the impact of nonphysician research personnel, who often consent patients for studies, is unclear. We therefore evaluated the association between consenter characteristics and subject interest in clinical research.

Methods

We performed a retrospective review of subjects enrolled in a university-based cancer center tissue repository. During enrollment, subjects were asked if they were willing to be contacted in the future to (a) provide medical follow-up information and (b) participate in other clinical research. We analyzed the association between responses to these questions and consenter characteristics using univariate analysis and multivariate logistic regression.

Results

In total, 181 consenters enrolled 922 subjects. The majority of subjects agreed to be contacted for follow-up (84.9%) and future research (83.1%). Subject willingness to be contacted for future research was associated with greater consenter experience in univariate and multivariate analyses. In multivariate analysis, subject willingness to be contacted for future research was associated with discordance between subject and consenter gender, but not with subject gender, race, or income, or consenter gender or race.

Conclusion

Consenter experience and subject-consenter gender discordance were associated with greater subject interest in participating in future research. The role of consenters in clinical research merits future study and should be considered in efforts to increase cancer clinical trial accrual.

The receptor interacting protein 1 inhibits p53 induction through NF-kappaB activation and confers a worse prognosis in glioblastoma.

Park S, Hatanpaa KJ, Xie Y, Mickey BE, Madden CJ, Raisanen JM, Ramnarain DB, Xiao G, Saha D, Boothman DA, Zhao D, Bachoo RM, Pieper RO, Habib AA.
April 2010 Cancer Res. 69(7):2809-16. doi: 10.1158/0008-5472.CAN-08-4079.

Abstract

Nuclear factor-kappaB (NF-kappaB) activation may play an important role in the pathogenesis of cancer and also in resistance to treatment. Inactivation of the p53 tumor suppressor is a key component of the multistep evolution of most cancers. Links between the NF-kappaB and p53 pathways are under intense investigation. In this study, we show that the receptor interacting protein 1 (RIP1), a central component of the NF-kappaB signaling network, negatively regulates p53 tumor suppressor signaling. Loss of RIP1 from cells results in augmented induction of p53 in response to DNA damage, whereas increased RIP1 level leads to a complete shutdown of DNA damage-induced p53 induction by enhancing levels of cellular mdm2. The key signal generated by RIP1 to up-regulate mdm2 and inhibit p53 is activation of NF-kappaB. The clinical implication of this finding is shown in glioblastoma, the most common primary malignant brain tumor in adults. We show that RIP1 is commonly overexpressed in glioblastoma, but not in grades II and III glioma, and increased expression of RIP1 confers a worse prognosis in glioblastoma. Importantly, RIP1 levels correlate strongly with mdm2 levels in glioblastoma. Our results show a key interaction between the NF-kappaB and p53 pathways that may have implications for the targeted treatment of glioblastoma.

Alterations in genes of the EGFR signaling pathway and their relationship to EGFR tyrosine kinase inhibitor sensitivity in lung cancer cell lines.

Gandhi J, Zhang J, Xie Y, Soh J, Shigematsu H, Zhang W, Yamamoto H, Peyton M, Girard L, Lockwood WW, Lam WL, Varella-Garcia M, Minna JD, Gazdar AF.
2009 PLoS One. 4(2):e4576. doi: 10.1371/journal.pone.0004576.

Abstract

Background

Deregulation of EGFR signaling is common in non-small cell lung cancers (NSCLC) and this finding led to the development of tyrosine kinase inhibitors (TKIs) that are highly effective in a subset of NSCLC. Mutations of EGFR (mEGFR) and copy number gains (CNGs) of EGFR (gEGFR) and HER2 (gHER2) have been reported to predict for TKI response. Mutations in KRAS (mKRAS) are associated with primary resistance to TKIs.

Methodology/principal Findings

We investigated the relationship between mutations, CNGs and response to TKIs in a large panel of NSCLC cell lines. Genes studied were EGFR, HER2, HER3 HER4, KRAS, BRAF and PIK3CA. Mutations were detected by sequencing, while CNGs were determined by quantitative PCR (qPCR), fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH). IC50 values for the TKIs gefitinib (Iressa) and erlotinib (Tarceva) were determined by MTS assay. For any of the seven genes tested, mutations (39/77, 50.6%), copy number gains (50/77, 64.9%) or either (65/77, 84.4%) were frequent in NSCLC lines. Mutations of EGFR (13%) and KRAS (24.7%) were frequent, while they were less frequent for the other genes. The three techniques for determining CNG were well correlated, and qPCR data were used for further analyses. CNGs were relatively frequent for EGFR and KRAS in adenocarcinomas. While mutations were largely mutually exclusive, CNGs were not. EGFR and KRAS mutant lines frequently demonstrated mutant allele specific imbalance i.e. the mutant form was usually in great excess compared to the wild type form. On a molar basis, sensitivity to gefitinib and erlotinib were highly correlated. Multivariate analyses led to the following results: 1. mEGFR and gEGFR and gHER2 were independent factors related to gefitinib sensitivity, in descending order of importance. 2. mKRAS was associated with increased in vitro resistance to gefitinib.

Conclusion/Significance

Our in vitro studies confirm and extend clinical observations and demonstrate the relative importance of both EGFR mutations and CNGs and HER2 CNGs in the sensitivity to TKIs.

Statistical methods of background correction for Illumina BeadArray data.

Xie Y, Wang X, Story M.
March 2009 Bioinformatics, Volume 25, Issue 6, Pages 751757
image

Abstract

Motivation

Advances in technology have made different microarray platforms available. Among the many, Illumina BeadArrays are relatively new and have captured significant market share. With BeadArray technology, high data quality is generated from low sample input at reduced cost. However, the analysis methods for Illumina BeadArrays are far behind those for Affymetrix oligonucleotide arrays, and so need to be improved.

Results

In this article, we consider the problem of background correction for BeadArray data. One distinct feature of BeadArrays is that for each array, the noise is controlled by over 1000 bead types conjugated with non-specific oligonucleotide sequences. We extend the robust multi-array analysis (RMA) background correction model to incorporate the information from negative control beads, and consider three commonly used approaches for parameter estimation, namely, non-parametric, maximum likelihood estimation (MLE) and Bayesian estimation. The proposed approaches, as well as the existing background correction methods, are compared through simulation studies and a data example. We find that the maximum likelihood and Bayes methods seem to be the most promising.

Supplementary Information

Supplementary data are available at Bioinformatics online.

Histone deacetylase inhibitor romidepsin enhances anti-tumor effect of erlotinib in non-small cell lung cancer (NSCLC) cell lines.

Zhang W, Peyton M, Xie Y, Soh J, Minna JD, Gazdar AF, Frenkel EP.
February 2009 Journal of Thoracic Oncology, Volume 4, Issue 2, Pages 161166

Abstract

Introduction

Most epidermal growth factor receptor (EGFR) mutant non-small cell lung cancers (NSCLCs) are sensitive to EGFR tyrosine kinase inhibitors (TKIs) such as erlotinib or gefitinib, but many EGFR wild type NSCLCs are resistant to TKIs. In this study, we examined the effects of the histone deacetylase inhibitor, romidepsin, in combination with erlotinib, in NSCLC cell lines and xenografts.

Methods

For in vitro studies, nine NSCLC cell lines with varying mutation status and histology were treated with erlotinib and romidepsin alone or in combination. 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium assays were performed to determine the concentration that inhibits 50% (IC50) value of each drug or the combination. For in vivo studies, NCI-H1299 xenografts were inoculated subcutaneously into athymic nude mice. Romidepsin and/or erlotinib were injected intraperitoneally after tumors developed and tumor sizes were measured.

Results

We found that romidepsin increased the sensitivity of erlotinib synergistically in all nine NSCLC cell lines including EGFR and KRAS wild type cell lines, KRAS mutant cell lines, and TKI resistant EGFR mutant cell lines. This effect was partially due to enhanced apoptosis. Furthermore, cotreatment of erlotinib and romidepsin inhibited NCI-H1299 xenograft growth in athymic nude mice.

Conclusion

These observations support a role for the combination of a histone deacetylase inhibitor and a TKI in the treatment of NSCLCs.

Cytoglobin, the newest member of the globin family, functions as a tumor suppressor gene.

Shivapurkar N, Stastny V, Okumura N, Girard L, Xie Y, Prinsen C, Thunnissen FB, Wistuba II, Czerniak B, Frenkel E, Roth JA, Liloglou T, Xinarianos G, Field JK, Minna JD, Gazdar AF.
September 2008 Cancer Res. 68(18):7448-56. doi: 10.1158/0008-5472.
image

Abstract

Cytoglobin (CYGB) is a recently discovered vertebrate globin distantly related to myoglobin with unknown function. CYGB is assigned to chromosomal region 17q25, which is frequently lost in multiple malignancies. Previous studies failed to detect evidence for mutations in the CYGB gene. Recent studies provided preliminary evidence for increased methylation of the gene in lung cancer. Our study was aimed at investigating the role of CYGB as a tumor suppressor gene. By nested methylation-specific DNA sequencing analysis of lung and breast cancer cell lines and bronchial and mammary epithelial cell lines, we identified that methylation of a 110-bp CpG-rich segment of the CYGB promoter was correlated with gene silencing. We specifically targeted this sequence and developed a quantitative methylation-specific PCR assay, suitable for high-throughput analysis. We showed that the tumor specificity of CYGB methylation in discriminating patients with and without lung cancer, using biopsies and sputum samples. We further showed the tumor specificity of this assay with multiple other epithelial and hematologic malignancies. To show tumor suppressor activity of CYGB, we performed the following: (a) RNA interference-mediated knockdown of CYGB gene on colony formation in a CYGB expression-positive lung cancer cell line, resulting in increased colony formation; (b) enforced gene expression in CYGB expression-negative lung and breast cancer cell lines, reducing colony formation; and (c) identification of potential proximate targets down-stream of the CYGB genes. Our data constitute the first direct functional evidence for CYGB, the newest member of the globin family, as a tumor suppressor gene.

Enhanced identification and biological validation of differential gene expression via Illumina whole-genome expression arrays through the use of the model-based background correction methodology.

Ding LH, Xie Y, Park S, Xiao G, Story MD.
June 2008 Nucleic Acids Research, Volume 36, Issue 10, Pages e58,

Abstract

Despite the tremendous growth of microarray usage in scientific studies, there is a lack of standards for background correction methodologies, especially in single-color microarray platforms. Traditional background subtraction methods often generate negative signals and thus cause large amounts of data loss. Hence, some researchers prefer to avoid background corrections, which typically result in the underestimation of differential expression. Here, by utilizing nonspecific negative control features integrated into Illumina whole genome expression arrays, we have developed a method of model-based background correction for BeadArrays (MBCB). We compared the MBCB with a method adapted from the Affymetrix robust multi-array analysis algorithm and with no background subtraction, using a mouse acute myeloid leukemia (AML) dataset. We demonstrated that differential expression ratios obtained by using the MBCB had the best correlation with quantitative RT-PCR. MBCB also achieved better sensitivity in detecting differentially expressed genes with biological significance. For example, we demonstrated that the differential regulation of Tnfr2, Ikk and NF-kappaB, the death receptor pathway, in the AML samples, could only be detected by using data after MBCB implementation. We conclude that MBCB is a robust background correction method that will lead to more precise determination of gene expression and better biological interpretation of Illumina BeadArray data.

Differential methylation of a short CpG-rich sequence within exon 1 of TCF21 gene: a promising cancer biomarker assay.

Shivapurkar N, Stastny V, Xie Y, Prinsen C, Frenkel E, Czerniak B, Thunnissen FB, Minna JD, Gazdar AF.
April 2008 Cancer Epidemiol Biomarkers Prev. 17(4):995-1000. doi: 10.1158/1055-9965.

Abstract

Detection of cancer cells at early stages could potentially increase survival rates in cancer patients. Aberrant promoter hypermethylation is a major mechanism for silencing tumor suppressor genes in many kinds of human cancers. A recent report from our laboratory described the use of quantitative methylation-specific PCR assays for discriminating patients with lung cancer from those without lung cancer using lung biopsies as well as sputum samples. TCF21 is known to be essential for differentiation of epithelial cells adjacent to mesenchyme. Using restriction landmark genomic scanning, a recent study identified TCF21 as candidate tumor suppressor at 6q23-q24 that is epigenetically inactivated in lung and head and neck cancers. Using DNA sequencing technique, we narrowed down a short CpG-rich segment (eight specific CpG sites in the CpG island within exon 1) of the TCF21 gene, which was unmethylated in normal lung epithelial cells but predominantly methylated in lung cancer cell lines. We specifically targeted this short CpG-rich sequence and developed a quantitative methylation-specific PCR assay suitable for high-throughput analysis. We showed the usefulness of this assay in discriminating patients with lung cancer from those without lung cancer using biopsies and sputum samples. We further showed similar applications with multiple other malignancies. Our assay might have important implications in early detection and surveillance of multiple malignancies.

Software

We have developed online analysis tools that allow users to explore and analyze lung cancer- and germ cell tumor-related gene expression data. PIPECLIP Galaxy is also provided here for biologists to identify the most likely cross-linking sites.

Online Software


Software

Lung Cancer Explorer

Lung Cancer Explorer is an online analysis tool which allows users to explore and analyze gene expression data from dozens of public lung cancer datasets.

Try Software
Software

Pipeclip Galaxy

PIPECLIP provides a pipeline for both bioinformaticians and biologist to identify the most likely cross-linking sites from PAR-CLIP, HITS-CLIP and iCLIP sequencing data.

Try Software
Software

Germ Cell Tumor Explorer

Germ Cell Tumor Explorer is an online analysis tool which allows users to explore and analyze gene expression data from dozens of public Germ Cell Tumor datasets.

Try Software

Software Packages


HITS-CLIP Analysis

We developed a model-based approach to detect RNA-protein binding sites in HITS-CLIP. The two-stage model was established on all sequencing reads to investigate binding sites at single base pair resolution. This toolbox provides essential MATLAB functions to implement our model for the identification of binding sites using heterogeneous logit models via semi-supervised learning.

PAR-CLIP HMM

Photoactivatable ribonucleoside-enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. This package provides an integrative model to establish a joint distribution of read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we adopt non-homogeneous hidden Markov models that incorporate the nucleotide sequence.

dCLIP

dCLIP is written in Perl for discovering differential binding regions in two CLIP-Seq (HITS-CLIP or PAR-CLIP) experiments. It is appropriate for experiments where the common binding regions that are significantly enriched in both conditions tend to have similar binding strength, and when researchers are more interested in the difference in binding strength rather than the binary event of whether binding site is common or not.

Bayesian Joint Analysis

Identifying which genes are differentially expressed (DE) and which gene sets are altered under two experimental conditions are both key questions in microarray analysis. This Bayesian joint modeling approach can address the two key questions in parallel, which incorporates the information of functional annotations into expression data analysis and simultaneously infers the enrichment of functional groups.
Reference: Wang X, Chen M, Khodursky AB and Xiao G, Bayesian Joint Analysis of Gene Expression Data and Gene Functional Annotations, Statistics in Biosciences. 2012 Nov; 4(2): 300-318

DecoRNAi

High-throughput RNAi screening has been widely used across the spectrum of biomedical research and has made it possible to study functional genomics. However, a challenge for authentic biological interpretation of large-scale siRNA or shRNA-mediated loss-of-function studies is the biological pleiotropy resulting from multiple modes of action of siRNA and shRNA reagents. A major confounding feature of these reagents is the microRNA-like translational quelling that can result from short regions (~6 nucleotides) of oligonucleotide complementarity to many different mRNAs. To help identify and correct miRNA-mimic off-target effects, we have developed DecoRNAi (deconvolution analysis of RNAi screening data) for automated quantitation and annotation of microRNA-like off-target effects in primary RNAi screening data sets. DecoRNAi can effectively identify and correct off-target effects from primary screening data and provide data visualization for study and publication. DecoRNAi contains pre-computed seed sequence families for 3 commonly employed commercial siRNA libraries. For custom collections, the tool will compute seed sequence membership from a user-supplied reagent sequence table. All parameters are tunable and output files include global data visualization, the identified seed family associations, the siRNA pools containing off-target seed families, corrected z-scores and the potential miRNAs with phenotypes of interest.

Probemapper

Connects to QBRC’s EntrezToProbe engine system to handle mappings between probes and genes and provide access to information about probes and genes.
References: Allen JD, Wang S, Chen M, Girard L, Minna J, Xie Y, Xiao G*. Probe mapping across multiple microarray platforms, Briefings in Bioinformatics, 2012 Sep;13(5):547-54. doi: 10.1093/bib/bbr076. PMID: 22199380

SbacHTS

Genome-wide RNAi screening experiments are customarily carried out on hundreds of 96-well or 384-well plates in order to study gene functions and discover novel drug targets. Spatial background noises, however, often blur interpretation of experimental results by distorting the distinct spatial patterns between different plates. It is therefore important to identify and correct the spatial background noises when analyzing RNAi screening data. Here, we developed an algorithm SbacHTS (Spatial background correction for High-Throughput RNAi Screening), for visualization, estimation and correction of spatial background noises of RNAi screening experiment results. SbacHTS can effectively detect and correct spatial background noise, leading to a higher signal/noise ratio and improved hits discovery for RNAi screening experiments. The only input required by the algorithm is the raw reads from the replicate plates.

MBCB

This package provides a model-based background correction method, which incorporates the negative control beads to pre-process Illumina BeadArray data.
References: Xie Y, Wang X, Story M. Statistical Methods of Background Correction for Illumina BeadArray. Bioinformatics, 2009, Mar 15;25(6):751-7. doi: 10.1093/bioinformatics/btp040. PMID: 19193732
Allen JD, Chen M, Xie Y (2009) Model-Based Background Correction (MBCB): R methods and GUI for Illumina Bead-array Data. J Canc Sci Ther 1: 025-027. doi:10.4172/1948-5956.1000004

Ensemble Network Aggregation (ENA)

Ensemble network aggregation is an approach that leverages the inverse-rank-product (IRP) method to combine networks. This package provides the capabilities to use IRP to bootstrap a dataset using a single method, to aggregate the networks produced by multiple methods, or to aggregate the networks produced on different datasets. Additionally, it offers convenience functions for converting between adjacency lists and matrices, and computing discrete graphs based on the Rank-Product method.

Members

Staff

Bo Yao

Scientific Programmer III

Jingwei Huang

Data Scientist III

Jiwoong Kim

Computational Biologist III

Ismael Villanueva-Miranda

Data Scientist II

Yang Liu

Data Scientist III

Xian Cheng

Research Associate

Post-doc

Collin Treager

Postdoctoral Researcher

Kuroush Nezafati

Postdoctoral Researcher

Qin Zhou

Postdoctoral Researcher

Tingyi Wanyan

Postdoctoral Researcher

Student

Zhuoyu Wen

PhD Student

Alumni

Beibei Chen

Computational Biologist I

Yunyun Zhou

Computational Biologist II

Faliu Yi

Postdoctoral Researcher

Jonghyun Yun

Postdoctoral Researcher(2012-2014)

Tang Hao

Postdoctoral Researcher
Assitant Professor

Donghyeon Yu

Postdoctoral Researcher(2012-2014)

Jungsik Noh

Postdoctoral Researcher(2012-2014)

Tao Wang

Ph.D. Student (2011-2015)

Rui Zhong

Ph.D Student (2009-2014)

Jichen Yang

Postdoctoral Researcher

Sangin Lee

Postdoctoral Researcher

Gaoxiang Jia

PhD

Qiwei Li

Postdoctoral Researcher

Xinyi Zhang

PhD

Ci Bo

Ph.D Student(2014-2019)

Shinyi Lin

Scientist Programmer

Rong Lu

Biostatistical Consultant III

Minzhe Zhang

PhD

Lin Zhong

Data Scientist

Hongyi Lai

Biostatistical Consultant I

Thomas Sheffield

Postdoctoral Researcher

Zhiqun Xie

Computational Biologist I

Shidan Wang

Assistant Professor

Hudanyun Sheng

Data Scientist I

Yunguan Wang

Computational Biologist I

Yueqi Li

Biostatistical Consultant I

About PI

Yang Xie

Raymond D. and Patsy R. Nasher Distinguished Chair in Cancer Research
Associate Dean of Data Sciences
Director of QBRC, PCDC and DSSR

  •  214-648-5178
  •  214-648-1663
  •  Yang.Xie@utsouthwestern.edu
  •   Danciger Research Building,
    5323 Harry Hines Blvd. Ste. H9.124,
    Dallas, TX 75390-8821

Download CV

Biography


Dr. Yang Xie holds the Raymond D. and Patsy R. Nasher Distinguished Chair in Cancer Research and is the Associate Dean for Data Sciences at UT Southwestern Medical Center. She is the founding director of the Quantitative Biomedical Research Center (QBRC), the Pediatric Cancer Data Commons (PCDC), and the Cancer Center Data Science Shared Resources (DSSR) at the Harold C. Simmons Comprehensive Cancer Center. Dr. Yang Xie received her training in biostatistics, medicine and epidemiology. Her research lab focuses on medical informatics, developing predictive and prognostic biomarkers, and precision medicine. She is currently the PI of an NIH Maximizing Investigators' Research Award (MIRA) grant, MPI of an NIAID U01 grant and PI of the Pediatric Cancer Data Core at CPRIT.

In addition, our team has extensive experience in developing and maintaining user-friendly software and comprehensive databases/web portals, including disease-specific web portals with online analytic tools for cancer:
http://lce.biohpc.swmed.edu/lungcancer/
https://qbrc.swmed.edu/projects/liverspore/
https://qbrc.swmed.edu/projects/ose/
https://qbrc.swmed.edu/projects/kidneyspore/

Academic Position


  • Present 2019
    Professor
    Department of Population and Data Sciences, UTSW Medical Center
  • Present 2018
    Director
    Pediatric Cancer Data Core, University of Texas Southwestern Medical Center at Dallas
  • 2019 2011
    Founding Director
    Simmons Cancer Center Bioinformatics Shared Resources, UTSW Medical Center
  • 2021 2018
    Faculty Advisor
    Bioinformatics Core Facility, UT Southwestern Medical Center, Dallas, TX
  • 2018 2015
    Founding Director
    Bioinformatics Core Facility, UT Southwestern Medical Center, Dallas, TX
  • 2019 2013
    Associate Professor
    Department of Bioinformatics and Department of Clinical Sciences, UT Southwestern Medical Center, Dallas, TX
  • present 2010
    Director
    Quantitative Biomedical Research Center, UT Southwestern Medical Center
  • 2013 2006
    Assistant Professor
    Department of Clinical Sciences, UT Southwestern Medical Center

Education & Training


  • PhD 2006
    Biostatistics
    University of Minnesota, Minneapolis, MN, USA
  • MS 2003
    Biostatistics
    University of Minnesota, Minneapolis, MN, USA
  • MS 2000
    Epidemiology
    Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
  • MD 1997
    Assistant Professor
    Peking University Health Science Center, Beijing, China

Awards & Grants


  • 2020
    Raymond D. and Patsy R. Nasher Distinguished Chair in Cancer Research
  • 2014
    First Place, NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge
    image
    Our team won the reward of "Best Performing Team" in “The NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge” (both sub-challenges), NIEHS, NCATS & DREAM organization
  • 2013
    First Place, NCI-DREAM Drug Sensitivity Prediction Challenge
    image
    Our team won the reward of "Best Performing Team" in “NCI-DREAM Drug Sensitivity Prediction Challenge,” National Cancer Institute & DREAM organization
  • 2008
    American Association for Cancer Research (AACR) Cancer Biostatistics Workshop—Developing Targeted Agents, Sonoma, CA
  • 2008
    Scholarship for Biometrics Society Young Researcher Workshop, Arlington, VA
  • 2008
    Bayesian Conference Award, Houston, TX
  • 2006
    Jacob E. Bearman Student Achievement Award, University of Minnesota
  • 2005-2006
    Doctoral Dissertation Fellowship, University of Minnesota
  • 1997
    Outstanding Student of Beijing City
  • 1997
    Guanghua Excellent Students Award, Peking University, School of Medicine
  • 1993-1997
    Scholarship for Excellent Students (First Place), Peking University, School of Medicine