I am an Assistant Professor of Data Science in the Peter O’Donnell Jr. School of Public Health at UT Southwestern Medical Center (UTSW) and a Texas Health Resources (THR) Clinical Scholar. My research focuses on developing methods, platforms, and infrastructure for the integration and analysis of multimodal healthcare and biomedical data to address important clinical questions. I have extensive experience in working with real-world data including electronic health records (EHRs), claims, medical notes, imaging, and molecular profiling data. Outcomes from my research include new clinical insights and applications, assessments of health and healthcare disparities, and data commons platforms for diverse disease domains.
As an awardee of the THR Clinical Scholars Program, I am the PI leading an internally funded research project, conducted at UTSW and THR, that pilots the use of large language models (LLMs) to mine EHR data for clinical and non-clinical insights. I have led and participated in multiple other projects that use LLMs to extract various features from free-text pathology reports, visit summaries, and progress notes. These works demonstrate the potential of LLMs in disease diagnosis, prognosis, assisting human chart review, and identifying healthcare disparities.
As the health informatics lead at UTSW’s Quantitative Biomedical Research Center, I spearhead efforts to develop comprehensive data commons and resources for various diseases, including adult and childhood cancers, cardiovascular diseases, liver diseases, and COVID-19. My data science and health informatics expertise is strengthened by a solid training in biomedical imaging sciences, where I gained extensive hands-on experience from benchtop to in silico and clinical settings.
As the Director of Biostatistics and Data Science Core at UTSW, I manage a team of 10 faculty and staff members to offer analytics, technological, and infrastructural support to the clinical research community. In this role, I oversee staffing, budgeting, regulatory affairs, and project timelines, and have developed strong capabilities in project management and resource.
Huang J, Yang DM, Rong R, Nezafati K, Treager C, Chi Z, Wang S,
Cheng X, Guo Y,
Klesse LJ, Xiao G, Peterson ED, Zhan X, Xie Y. (2024).
npj Digit Med. DOI: 10.1038/s41746-024-01079-8
Yang DM#, Zhou Q#, Furman-Cline L, ..., Xie Y.
(2023).
JCO Clin Cancer Inform. PMC10681418. DOI: 10.1200/CCI.23.00104
Velasco F#, Yang DM#, Zhang M, Nelson T, Sheffield T, Keller T, Wang Y, Walker C,
Katterapalli C, Zimmerman K, Masica A, Lehmann CU, Xie Y, Hollingsworth JW.
(2021)
J Hosp Med. PMC8577697. DOI: 10.12788/jhm.3717
The emergence of large language models (LLMs) unlocks unprecedented opportunity for extracting valuable insights from previously inaccessible or underutilized free-text medical notes. My latest research centers on developing LLM-powered approaches for extracting structured data elements from these notes, with a focus on practical implementation in real-world clinical settings.
Publications
A key foundation for precision medicine is the effective integration and analysis of real-world data. My core expertise lies in developing health informatics methods, platforms, and infrastructure to harmonize and integrate multimodal healthcare and biomedical research data. In this area, I led the design and development of data models, extract transform load (ETL) pipelines, and data commons featuring user-friendly web interfaces that facilitate exploration and analytics across different data sources and types. Collaborating with a multidisciplinary team, I successfully coordinated the collection, integration, and management of diverse data assets, including EHRs, medical notes, imaging, and molecular profiling data from various healthcare systems (e.g., UTSW, Children’s Health) as well as cross-system organizations (e.g., Children’s Oncology Group, Malignant Germ Cell International Consortium).
Publications
Advanced data analytics strategies, particularly deep learning-based approaches, have significant potential for uncovering hidden insights from complex real-world healthcare data. Working with multidisciplinary teams, I have applied deep learning and statistical methods to analyzing EHR, claims, and registry data, with a focus on characterizing disease and care patterns on both individual and group levels. My analytics works have spanned a diverse range of disease settings, including cardiovascular diseases, cancer, and COVID-19. These studies yielded novel findings on identifying risk factors, predicting clinical outcomes, and addressing healthcare disparities.
Publications
Biomedical imaging technologies reveal intricate biological and pathological details spanning from the cellular to the systemic level. I have obtained extensive techniques and experience in magnetic resonance imaging and digital pathology. I have developed both experimental and computational methods for generating and analyzing image data to improve disease diagnosis and prognosis. I have also led the development of a robust experimentation platform for quantifying intracellular water preexchange lifetime in neurons and astrocytes, a fundamental measure that impacts the design of various magnetic resonance imaging techniques for studying the nervous system.
Publications
Assessing disease severity in cutaneous lupus patients using natural language processing: preliminary data from a cohort study
Incidence and prevalence of atherosclerotic cardiovascular disease in cutaneous lupus erythematosus
Enhancing Medical Imaging Segmentation with GB-SAM: A Novel Approach to Tissue Segmentation Using Granular Box Prompts
Deep Learning-Based Automated Measurement of Murine Bone Length in Radiographs
A critical assessment of using ChatGPT for extracting structured data from clinical notes
Osteosarcoma Explorer: A Data Commons With Clinical, Genomic, Protein, and Tissue Imaging Data for Osteosarcoma Research
Deep learning in digital pathology for personalized treatment plans of cancer patients
Deep-Learning-Based Hepatic Ploidy Quantification Using H&E Histopathology Images
Features of tumor-microenvironment images predict targeted therapy survival benefit in patients with EGFR-mutant lung cancer
Enhanced Pathology Image Quality with Restore-Generative Adversarial Network
A Deep Learning Approach for Histology-Based Nucleus Segmentation and Tumor Microenvironment Characterization
Spatial molecular profiling: platforms, applications and analysis tools
A deep learning-based model for screening and staging pneumoconiosis
Association of Healthcare Access With Intensive Care Unit Utilization and Mortality in Patients of Hispanic Ethnicity Hospitalized With COVID-19
Upfront Brain Treatments Followed by Lung Surgery Improves Survival for Stage IV Non-small Cell Lung Cancer Patients With Brain Metastases: A Large Cohort Analysis
Oxygen-Sensitive MRI: A Predictive Imaging Biomarker for Tumor Radiation Response?
Computational Staining of Pathology Images to Study the Tumor Microenvironment in Lung Cancer
Development of a Data Model and Data Commons for Germ Cell Tumors
Molecular differences across invasive lung adenocarcinoma morphological subgroups
Examining correlations of oxygen sensitive MRI (BOLD/TOLD) with [(18)F]FMISO PET in rat prostate tumors
Oxygen-sensitive MRI assessment of tumor response to hypoxic gas breathing challenge
Pathology image analysis using segmentation deep learning algorithms
Artificial Intelligence in Lung Cancer Pathology Image Analysis
ConvPath: A software tool for lung adenocarcinoma digital pathological image analysis aided by a convolutional neural network
Type and case volume of health care facility influences survival and surgery selection in cases with early-stage non-small cell lung cancer
Systematic Analysis of Gene Expression in Lung Adenocarcinoma and Squamous Cell Carcinoma with a Case Study of FAM83A and FAM83B
Intracellular water preexchange lifetime in neurons and astrocytes
Sodium NMR relaxation in mesoporous systems
Na-23 and H-1 NMR Relaxometry of Shale at High Magnetic Field
Sacrificial-Template-Assisted Syntheses of Aluminate and Titanate Nanonets via Interfacial Reaction Growth
Nicotinamide mononucleotide adenylyl transferase 1 protects against acute neurodegeneration in developing CNS by inhibiting excitotoxic-necrotic cell death
Aluminothermal Reaction Approach for Micro-/Nanofabrications: Syntheses of In2O3 Micro-/Nanostructures and InN Octahedral Nanoshells
Assistant Professor
Director, Biostatistics and Data Science Core
Manager, Pediatric Cancer Data Core
O'Donnell School of Public Health
UT Southwestern Medical Center
I am an Assistant Professor of Data Science in the Peter O’Donnell Jr. School of Public Health at UT Southwestern Medical Center (UTSW) and a Texas Health Resources (THR) Clinical Scholar. My research focuses on developing methods, platforms, and infrastructure for the integration and analysis of multimodal healthcare and biomedical data to address important clinical questions. I have extensive experience in working with real-world data including electronic health records (EHRs), claims, medical notes, imaging, and molecular profiling data. Outcomes from my research include new clinical insights and applications, assessments of health and healthcare disparities, and data commons platforms for diverse disease domains.