About MicrobiomeGym


Microbiome is an emerging field with many potential health related applications. Empowered by next generation sequencing technology, microbiome sequencing data contains valuable information but is not trivial to analyze. Although many statistical methods have been developed including t-test, RNA-seq differential analysis methods and microbiome-specific association models, there are remaining statistical questions to be addressed and technical barriers to be overcome.

As a first step to the journey of combing statistics and microbiome research, we built MicrobiomeGym and aims to openly address some questions in community-friendly fashion:

  • representative datasets from simulation, synthesis and real studies with important health impacts
  • a unified association analysis tool box that ease the application of existing metagenomic association methods
  • a platform to exhibit reproducible analysis results based on the hosted datasets and methods

To fulfill these goals, we host datasets with excessive documentations (e.g., provide assumptions and codes for simulated data, and references for real datasets). We host a wide range of statistical tests and their benchmark results with reproducible codes and results. We encourage microbiome researchers to contribute their datasets , methods or method benchmarks via a transparent workflow documented in github . We hope that the MicrobiomeGym can be a valuable hub to advance quantitative microbiome research.


Metagenome-wide association studies (MWAS) can interrogate the association between microbiota and diseases. With the increasing number of the available MWAS methods, analysts may have the burden to learn, use or implement these in their scientific studies, which can be repetitive for the whole science community. Thus we curated a collection of published methods and designed a unified analysis routine implemented in this resource, in the R package MicrobiomeGym.
MicrobiomeGym offers simulated datasets, synthetic datasets and datasets from real studies. A wide-range of datasets provides the opportunities to evaluate the existing and novel association methods from multiple perspectives - from different human diseases to different ecological conditions.
Documented data resources and analysis routines enables reproducible results – a critical type of research for modern medical and statistical research. MicrobiomeGym provide R codes with datasets, methods and analysis results. Users can freely download the data, code and reproduce analysis results in their own computation environments.