Computational Methods for Breath Metabolomics in Clinical Diagnostics

Anne-Christin Hauschild
Krembil Research Institute Data Science Discovery Centre for Chronic Diseases
Monday, March 5, 2018 - 11:00am
Fitzgerald Building, Room #237
Research Group Seminar
Abstract: 
Computational Methods for Breath Metabolomics in Clinical Diagnostics Odors and vapors of the body and breath have been known for their diagnostic power for millennia. More recent history confirmed this knowledge within clinical studies by successfully training dogs and mice to detect diseases, by sniffing specific volatile organic profiles. Like a vertebrate nose, there exist analytical technologies capable of capturing such metabolites. The science of analyzing the aggregation of all metabolites within the breath of an organism is called breathomics. The crucial task is to identify discriminating patterns that are predictive for certain diseases. Additionally, like other diagnostic technologies, breath is influenced by various sources of systematic or random noise. The field needs to move from separability to predictability by evolving from pilot studies to large scale screening studies. Therefore, there is a necessity for further standardization and automatization in managing, analyzing and evaluating this novel type of metabolomics data. In order to achieve this, several challenges remain to be addressed: data accumulation and heterogeneity; manual peak finding; unknown metabolites; robust statistics and biomarkers; background noise and confounding factors; heterogeneous diseases and disease stages; usability, maintainability, and re-usability. In six different sub-projects we developed possible solutions to the described challenges. 1. The IMSDB is the first functional and flexible comprehensive breatomics database. It provides flexible yet quick storage of heterogeneous clinical and large amounts of metabolic breath data. 2. Our pilot study on COPD prediction layed the foundations for a more robust and adequate prediction, evaluation and feature selection of breathomics data, by introducing established machine learning techniques to the field of breath analysis. 3. Further, we presented the first qualitative analysis of the performance of automated peak detection methods and thereby proves their ability to compete with the manual gold standard. 4. The MIMA software tool enables the automated identification of the captured organic components by mapping different analytical technologies. 5. The Carotta software system provides a user friendly unsupervised learning platform, that enables easy discovery of hidden structures in metabolomic breath data such as disease subtypes or confounding factors. 6. Finally, we introduced the first longitudinal modeling of breath metabolite behavior during the course of an evolving disease. The aggregation of these projects builds the foundation for a more robust and standardized analysis schema, leading to more comparability and generalization of future breathomics studies. Moreover, it sets the basis for automated frameworks integrating the described tools and approaches into steps of a continuous breath analysis pipeline.
Host: 
Bernhard Ganss
Supervisor: 
Igor Jurisica