Table of Contents

Metabolomics Data Harmonization and Meta-Analysis

Drift Correction

Liquid chromatography tandem mass spectrometry (LC-MS) has emerged as the major technology used for metabolomic profiling, however raw datasets require extensive processing before they may be analyzed toward discovering biological patterns and disease associations.

We are currently developing massSight, an R package for the alignment and scaling of LC-MS data. We seek to produce a suite of statistical and computational methods to increase the biological signal-noise ratio. These approaches include removing batch effects, drift correction, clustering chemical compounds, and removal of non-biological background signals.

AI/ML to Infer Biology from Sequencing Data

The application of Artificial Intelligence (AI) and Machine Learning (ML) in deciphering biological insights from sequencing data represents a transformative shift in the field of Computational Biology. These computational technologies can handle the enormous volumes of data generated by sequencing platforms, from DNA and RNA to more complex proteomic sequences. Traditional analytical methods often fall short of capturing the intricate patterns and relationships hidden in this data. AI/ML algorithms, however, excel in identifying these subtle connections, enabling more accurate predictions and fostering deeper understanding of biological processes. We are developing AI and ML models and tools for applications ranging from profiling microbial species and biosynthetic gene clusters (BGCs), identifying genetic markers for diseases, testing for rare diseases, understanding evolutionary pathways, to even the development of personalized medicine.



Characterizing temporal dynamics of longitudinal omics

Longitudinal studies and clinical trials, combined with omics measurements, are revolutionizing drug development by providing a holistic understanding of disease progression, treatment responses, and shifts in biological markers. This integration accelerates drug discovery and enables the utilization of advanced technologies like AI and machine learning. However, challenges such as complex data structures and limited sample sizes can restrict the full potential of longitudinal omics data. To overcome these challenges, we aim to develop robust machine learning techniques, including Gaussian Processes, tailored for longitudinal omics analysis. Our strategy involves the creation of user-friendly software such as wavome and the application of these methodologies to enhance biomarker discovery and association testing by characterizing the dynamics of omics features in relation to clinical and participant/sample characteristics.



Functional Integration of Omics Data

Recent advancements in high-throughput technologies, such as DNA sequencing techniques and liquid chromatography-mass spectrometry, have enabled us to capture intricate snapshots of human biology activities through multi-omics data on a large scale. This multi-omics approach provides an unprecedented opportunity for in-depth structural and molecular profiling of human biology across various molecular levels. However, the challenge lies in effectively integrating and analyzing this wealth of multi-omics data.

These modern biological screens produce an overwhelming number of measurements, spanning genomics, transcriptomics, proteomics, and metabolomics, among others. Finding statistically significant associations among features and integrating these different omics data sets at the metabolic functional level in an interpretable manner is imperative.

In this project, our goal is to develop statistical and machine learning tools that leverage deep learning approaches to harness the power of multi-omics data. We aim to uncover enriched metabolic pathways and gain a comprehensive understanding of human biology by integrating information across these diverse molecular layers. This addresses the critical need for more efficient and insightful analysis of complex biomedical data, particularly in the context of multi-omics datasets.

btest omePath

Investigating human health conditions using omics data


The COVID-19 pandemic, driven by the SARS-CoV-2 virus, has brought about profound global changes, yet its ultimate consequences remain uncertain. As the virus evolves in response to host immune systems and intervention measures, efforts are underway to develop accessible, repeatable tools for integrating and analyzing the vast array of pandemic-related data. These tools are being applied to study genetic variations in SARS-CoV-2 and their associations with clinical health outcomes. Additionally, metabolomics and proteomics data are being employed to understand changes in COVID-19 severity. These endeavors aim to provide valuable insights for guiding vaccine development, monitoring disease epidemiology, and characterizing the virus’s genomic evolution patterns.

In parallel, research efforts are exploring the interplay between the viral genome and host genetic backgrounds, examining the 3D protein structure’s role in disease etiology, and investigating biomarkers that explain the diverse health outcomes associated with COVID-19. Deep sequencing analysis and machine learning techniques are being employed to identify regions in the gene sequence responsible for COVID-19 detection, while omics technologies are being leveraged to link genetic and phenotypic data with clinical information for disease prediction, diagnosis, and therapeutic advancements. These multifaceted research initiatives aim to provide valuable resources and insights to the broader scientific community, facilitating collaboration and discussions to combat the pandemic effectively.

Microbiome and Metabolomics of Pregnancy, Breast Milk Feeding, and Infant Health

Breast milk omics

Microbiome and Metabolomics of Cancer


Microbiome and Metabolomics of Obesity, RYGB, and Sleeve


Characterizing Dynamics of Cells and Genes in Skin Injury


Explore Research Directions Using pubSight Visualization Tool