Biomedizinische Informatik und Mechatronik

Institut für Elektrotechnik und Biomedizinische Technik


Knowledge Discovery in Databases Designer (KD3) covers the complete KDD process using a workflow-oriented architecture with preview of results of every step of the workflow. KD3 offers a variety of already implemented methods, algorithms and workflows, which can easily be extended by adding additional components using functional objects.

Download KD3


Dander A, Handler M, Netzer M, Pfeifer B, Baumgartner C. KD3: Knowledge Discovery in Database Design tool for workflow-based exploration of biomedical data sets, Transactions TLDKS, 2011, in press

Recently a modified version of KD3 has been implemented for preprocessing and identifying highly discriminatory features (biomarker candidates) in mass spectrometry data. This version is still under development but can already be downloaded here.

  • OBOBrowseA - OBO Browse and Annotate

The software allows to load and display OBO files in tree or graph representation. It  further enables the user to interactively browse through the onotology, search for ontology classes and annotate textual data.

Download OBOBrowsA

  • Profiling the human response to physical exercise: a computational strategy for the identification and kinetic analysis of metabolic biomarkers

BACKGROUND: In metabolomics, biomarker discovery is a highly data driven process and requires sophisticated computational methods for the search and prioritization of novel and unforeseen biomarkers in data, typically gathered in preclinical or clinical studies. In particular, the discovery of biomarker candidates from longitudinal cohort studies is crucial for kinetic analysis to better understand complex metabolic processes in the organism during physical activity. FINDINGS: In this work we introduce a novel computational strategy that allows to identify and study kinetic changes of putative biomarkers using targeted MS/MS profiling data from time series cohort studies or other cross-over designs. We propose a prioritization model with the objective of classifying biomarker candidates according to their discriminatory ability and couple this discovery step with a novel network-based approach to visualize, review and interpret key metabolites and their dynamic interactions within the network. The application of our method on longitudinal stress test data revealed a panel of metabolic signatures, i.e., lactate, alanine, glycine and the short-chain fatty acids C2 and C3 in trained and physically fit persons during bicycle exercise. CONCLUSIONS: We propose a new computational method for the discovery of new signatures in dynamic metabolic profiling data which revealed known and unexpected candidate biomarkers in physical activity. Many of them could be verified and confirmed by literature. Our computational approach is freely available as R package termed BiomarkeR under LGPL via CRAN Datesets: male and female, Lactate: male and female

J Clin Bioinforma. 2011 Dec 19;1(1):34. doi: 10.1186/2043-9113-1-34.

  • A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury

The discovery of new and unexpected biomarkers in cardiovascular disease is a highly data-driven process that requires the complementary power of modern metabolite profiling technologies, bioinformatics and biostatistics. Clinical biomarkers of early myocardial injury are lacking. A prospective biomarker cohort study was carried out to identify, categorize, and profile kineticpatterns of early metabolic biomarkers of planned (PMI) and spontaneous (SMI) myocardial infarction. We applied a targeted MS-based metabolite profiling platform to serial blood samples drawn from carefully phenotyped patients undergoing alcohol septal ablation for hypertrophic obstructive cardiomyopathy serving as a human model of PMI. Patients with SMI and patients undergoing catheterization without induction of myocardial infarction served as
positive and negative controls to assess generalizability of markers identified in PMI.

To identify metabolites of high predictive value in MS/MS data, we introduced a new feature selection method for the categorization of metabolic signatures into three classes of weak, moderate and strong predictors which can be easily applied to both paired and unpaired samples. Our paradigm outperformed standard null-hypothesis significance testing and other popular methods for feature selection in terms of the area under the ROC curve and the product of sensitivity and specificity. Our results emphasize that this new method was able to identify, classify and validate alterations in levels in multiple metabolites participating in pathways associated with myocardial injury as early as 10 minutes after PMI.

Baumgartner et al., Bioinformatics, 2010

  • Improving Phosphopeptide/Protein Identification Using a New Mining Framework for MS/MS Spectra Preprocessing  

Phosphopeptide/protein identification using tandem mass spectrometry (MS/MS) is a challenging issue in proteomics research. In particular, phosphopeptides typically exhibit low intensity peaks of b and y ions in spectra when serine or threonine is phosphorylated. Consequently, the existing algorithms for peptide and protein identification generate a high false discovery rate when coping with phosphopeptide spectra. In order to increase the number of correct phosphopeptide identifications using database search, a new data mining approach for spectra preprocessing is proposed. A support vector machine classifier is used to calculate the probability of
each peak representing a b or y ion. Next, low-probability peaks are removed from spectra, while remaining peaks have their intensities enhanced. As a result, a huge increase in signal-to-noise ratio is provided and the chances of detecting important peaks are significantly advanced. Experiments using MASCOT and SEQUEST along with Peptide/ProteinProphet and a decoy database approach showed a significant improvement in the sensitivity of phosphopeptide identification without compromising specificity, demonstrating that our new strategy for MS/MS spectra preprocessing is a powerful proteomics tool for enhancing phosphopeptide identifications.

Cerqueira et al.,J Proteomics Bioinform 2009;2:150-164.

  • A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)

Alcoholic fatty liver disease (AFLD) and nonalcoholic fatty liver disease (NAFLD) can progress to severe liver diseases such as steatohepatitis, cirrhosis and cancer. Thus, the detection of early liver disease is essential; however, minimal invasive diagnostic methods in clinical hepatology still lack specificity.
Ion molecule reaction mass spectrometry (IMR-MS) was applied to a total of 126 human breath gas samples comprising 91 cases (AFLD, NAFLD and cirrhosis) and 35 healthy controls. A new feature selection modality termed Stacked Feature Ranking (SFR) was developed to identify potential liver disease marker candidates in breath gas samples, relying on the combination of different entropy-, correlation- and t-test- based feature ranking methods using  a two-level architecture with a suggestion and a decision layer. We benchmarked SFR against four single feature selection methods, a wrapper and a recently described ensemble method, indicating a significantly higher discriminatory ability of up to 10-15% for the SFR selected gas compounds expressed by the area under the ROC curve of AUC=0.85-0.95. Using this approach, we were able to identify unexpected breath gas marker candidates in liver disease of high predictive value. A literature study further supports top ranked markers to be associated with liver disease. We propose SFR as a powerful tool for biomarker search in breath gas and other biological samples using mass spectrometry.
Netzer et al., Bioinformatics, 2009;25(7):941-947.

Download algorithm SFR and MS data

  • A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry

Prostate cancer is the most prevalent tumor in males and its incidence is expected to increase as the population ages. Prostate cancer is treatable by excision if detected at an early enough stage. The challenges of early diagnosis require the discovery of novel biomarkers and tools for prostate cancer management. A novel feature selection algorithm termed associative voting (AV) was developed for identifying biomarker candidates in prostate cancer data measured via targeted metabolite profiling MS/MS analysis. We benchmarked our algorithm against two standard entropy-based and correlation-based feature selection methods (Information Gain and ReliefF) and observed that, on a variety of classification tasks in prostate cancer diagnosis, our algorithm identified subsets of biomarker candidates that are both smaller and show higher discriminatory power than the subsets identified by Information Gain and ReliefF. A literature study confirms that the highest-ranked biomarker candidates identified by AV have independently been identified as important factors in prostate cancer development.

Osl et al., Bioinformatics, 2008;24(24):2908-2914.

Download Associative Voting (AV)

  • SeMoP: A New Computational Strategy for the Unrestricted Search for Modified Peptides Using LC-MS/MS Data

SeMoP strategy enables the unrestricted discovery and verification of peptide modifications using LC-MS/MS data. SeMoP relies on coupling standard database searching with a new algorithm for an unrestricted search of peptide modifications. Interesting modifications found in unrestricted search are targeted in a standard database search to verify modified peptides. Various modifications, including post-translational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach. 
Baumgartner et al., J Proteome Res, 2008;7(9):4199-208.

Download SeMoP-Tool for the unrestricted search (step 2) (60MB)

  • LCF: Instance based classification with local density

Classification is an important data mining task in biomedicine. In particular, classification on biomedical data often claims the separation of pathological and healthy samples with highest discriminatory performance for diagnostic issues. Even more important than the overall accuracy is the balance of a classifier, particularly if data sets of unbalanced class size are examined. A novel instance-based classification technique was developed which takes both information of different local density of data objects and local cluster structures into account. Our method, which adopts the basic ideas of density based outlier detection, determines the local point density in the neighborhood of an object to be classified and of all clusters in the corresponding region. A data object is assigned to that class where it fits best into the local cluster structure. The experimental evaluation on biomedical data demonstrates that our approach outperforms most popular classification methods.

Plant et al., Bioinformatics, 2006;22(8):981-8.

Download LCF