Advanced theoretical concepts
Institute of Bioinformatics, University of Muenster, Muenster, Germany
AIMS
The objective of PROVABES is to go beyond existing
conventional approaches in strengthening the impact of
biomarkers as identified by participants of this consortium. A
large amount of gene expression data contributing to the scope
of this network has already been generated. A database platform
will be established by integrating previous joint research
projects (ENCCA, EuroBoNet, EURO-EWING, TranSaRNet, ASSET).
Integrating these data sets in the current research efforts to
perform combined or re-analyses and using them as a reference,
is essential.
However, data comparison is often not as
easy and straightforward as expected; the established quality
checks and normalisation procedures will give an estimate if a
comparison might be possible. As the focus of this research
proposal is directed towards biomarker validation, the expected
heterogeneity of the data is certainly a challenge. This
underlines the importance of establishing coordinated
experimental designs according to standardised objectives and
SOPs. In high dimensional data collections, the number of
system states erodes meaningful conclusions if data from
different experimental sources are averaged.
WORK PLAN
The biostatistical part is complemented by a
bioinformatics part focussing on exploratory data analysis
strategies (top down) and data integration (bottom up). This
platform part is based on a large portfolio of methods ranging
from sequence analysis, de novo motive search algorithms,
comparative approaches, biomathematical methods in clinical
research, classification algorithms and expression analysis up
to de novo reconstruction of cellular dependency networks
Specifically,
we support the important aspect of defining the characteristics
of the biological network environment around the selected set
of biomarkers. This knowledge essentially adds stability to
diagnostic interpretations and discussions by adding
consistency to complex clinical observations and complex signal
patterns of mid-sized biomarker panels.
In WP1.2, the
bioinformatics team will focus on functional annotation of the
genomic fragments that display copy number alteration (CNA). We
will create a database to store and annotate CNAs detected in
ES patients. The data generated by this project will be
complemented by literature search. The front end of the
database will provide a Graphical User Interface enabling
comments on annotated features by the registered users. The
system will be implemented using PostgreSQL, open source
object-relational database system and series of in house built
scripts to accommodate the project-generated data. At the
analytical level, we will especially focus on genes known to be
involved in ES. If any of these are discovered in CNA regions,
we will investigate how this affects the cognate network(s).
In
the course of WP1.2, besides a multivariate statistics approach
(together with the biostatistics group) data-based
probabilistic methods can be applied – specifically, resampling
methods following ideas in to differentiate signals and
separate them from noise. Prerequisite is to set up specific
algorithms to unify and integrate the diverse data sources in a
standardised way. Key elements are the weighting and
normalisation procedures to acquire the proposed data
integration. These steps determine the outcome of every
procedure
In WP2, gene and miRNA expression analysis will
be supported by established differential evaluation pipelines
(R, Bioconductor). The network defining approach starts with a
differential miRNA analysis followed by a TargetScan/mRNA data
based miRNA target prediction. Additional procedures are
forming miRNA-target networks. The functional analysis is based
on gene-phenotype approaches and the identification of
discriminative sub-networks and enriched pathways. Further
refinement to predictive miRNA sub-networks can be achieved by
utilising protein interaction data.
WP3 will generate TMA
and protein expression data. Here, we support data analysis by
decipherin correlation-based dependency structures between the
(expressed) biomarkers themselves and supporting factors. The
procedure in its first step is similar to procedures like
Langenfeld et al. [2008] but then hooks on a data-driven
combinatorial algorithm [Korsching et al. 2005, 2013] applying
ideas similar to Patil et al. [2011]. The procedure is suited
for the given number of factors up to 18. More factors might be
included by superimposing more than one factor by applying
certain threshold rules. The given approach exhaustively
analyses the factor panel to find the optimal dependency
structure covered by the data. The qualitative result can
directly be compared with observations described in the
literatur.
EXPLOITATION OF THE RESULTS
This projects is performing
as a consolidation and cross-linking plattform. So a lot of
backbone support to data validation and building of a network
context around highlighted molecular markers will be provided.
The focus on the dependency analysis will further support
proposals to include new or exclude given molecular markers as
well as to define marker panels with a certain specificity.
The
whole applied methodology is not specific for the sarcomas the
PROVABES project is focused on, but the experiences will shape
the evolution of improved analysis concepts and algorithms.