UT Southwestern Medical Center

Software Packages


We developed the Benisse model (BCR embedding graphical network informed by scRNA-Seq) to provide refined analyses of BCRs guided by single cell gene expression. Benisse revealed a gradient of B cell activation along BCR trajectories. We found that BCRs form a directed pattern of continuous and linear evolution to achieve the highest antigen targeting efficiency, compared with the convergent evolution pattern of T cell receptors (TCRs).


We developed a linear B cell epitope prediction model, BepiTBR, based on T-B reciprocity, and performed comprehensive validations. We showed that explicitly including the enrichment of putative CD4+ T-cell epitopes (predicted HLA class II epitopes) in the model leads to significant enhancement in the prediction of linear B cell epitopes.


dCLIP is written in Perl for discovering differential binding regions in two CLIP-Seq (HITS-CLIP or PAR-CLIP) experiments. It is appropriate in experiments where the common binding regions that are significantly enriched in both conditions tend to have similar binding strength and when researchers are more interested in the difference in binding strength rather than the binary event of whether binding site is common or not.


We develop a hierarchical Bayesian model, named Neoantigen-T cell Interaction Estimation (netie) to infer the history of neoantigen-CD8+ T cell interactions in tumors. We showed that tumors with an increase in immune selection pressure over time, inferred from the distribution of neoantigens, demonstrate an expression signature of activation of T cells. We also discovered exhaustion of T cell cytotoxic activity post receiving immunotherapy treatment in the tumor clones that newly arise after treatment.


An R package for estimating the gene expression levels and component proportions of the normal, stroma (immune) and tumor components of bulk tumors. The DisHet package also documents a series of gene signatures for tumor infiltrating immune cells, which are defined with empirical evidence gained from DisHet analysis of 35 RCC trio RNA-Seq data.


A deep learning neural network model for predicting TCR binding specificity of peptides presented by class I HLA alleles

QBRC mutation calling pipeline

The QBRC mutation calling pipeline is a flexible and comprehensive pipeline for mutation calling that has glued together a lot of commonly used software and data processing steps for mutation calling. The mutation calling software include: sambamba, speedseq, varscan, shimmer, strelka, manta, lofreq_tar. It identifies somatic and germline variants from whole exome sequencing (WXS), RNA sequencing and deep sequencing data. It can be used for human, PDX, and mouse data (fastq files or bam files as input).

QBRC neoantigen calling pipeline

The QBRC neoantigen calling pipeline is a comprehensive and user-friendly neoantigen calling pipeline for human genomics samples. It needs the somatic mutation calling results of the QBRC mutation calling pipeline, the tumor/normal exome-seq data for HLA typing, and optionally RNA-seq data for filtering neoantigens called from the exome-seq data. It profiles both MHC I and II-binding neoantigens. The calculation of CSiN (Cauchy-Schwarz index of Neoantigens), which describes neoantigen clonal balance, is embedded in the pipeline.

Please contact Tianshi.lu@utsouthwestern.edu for genome reference files.


SCINA is a semi-supervised algorithm for identification of cell types in single cell profiling data. It automatically exploits prior knowledge of cell type-specific signatures as a form of supervision. It also works for disease subtyping at the patient level, or other scenarios where data of similar format are available.


We introduce SClineager that infers accurate evolutionary lineages from scRNA-seq data by borrowing information from related cells to overcome expressional drop-outs. SClineager provides a powerful tool that can be applied to scRNA-Seq data to decipher the lineage histories of cells, and which could address a missing opportunity to reveal valuable information from the large amounts of existing scRNA-Seq data.


We developed a mathematical model, named Sprod, to impute accurate ST gene expression based on latent space and graph learning of matched location and imaging data. We comprehensively validated Sprod and demonstrated its advantages over prior methods for removing drop-outs in scRNA-seq data.


Tessa is a Bayesian model to intergrate T cell receptor (TCR) sequence profiling with transcriptomes of T cells. Enabled by the recently developed single cell sequencing techniques, which provide both TCR sequences and RNA sequences of each T cell concurrently, Tessa maps the functional landscape of the TCR repertoire, and generates insights into understanding human immune response to diseases.

Tissue Positioning System (TPS)

TPS is a quantitative, unsupervised algorithm for zonated expression pattern detection in immunofluorescence images. It is developed for evaluating zonated protein expression in hepatocytes. We welcome suggestions for other potential use cases. For current hepatocyte application, TPS requires an input IF image with DAPI channel for nuclei and GS channel for central veins.