| |
Pharmaceuticals
industry has high potential for benefiting from
business intelligence. Development of drugs is
a complex process with many parameters and it
requires huge investments. Therefore each process
optimization of drug research or production can
significantly save the costs. Proper data mining
leads to such optimizations by number of ways
e.g. by estimating input compounds importance,
modeling the product characteristics based on
process parameters, predicting the trends in production
data etc.
Data analysis is also an important part of bioinformatics
- a field concerning acquisition, processing and
other use of biological information. Our computer
aided analysis provides a significant support
for clinical tests, practice, pharmaceutical research
in various ways. Here we name several specific
possible applications, although each application
depends on specific clients requirements.
|
| |
|
 |
Statistics
for Clinical Test Studies |
| |
Every clinical
trial means a source of valuable but expensive
information. To fully exploit the knowledge hidden
in this information, the results should be analyzed
by broad range of tools. We can analyze data by
numerous classical statistical methods or by statistics
leveraged by latest artificial intelligence techniques.
Data dependency analysis / Correlation
analysis
Studying the correlations in
data streams by advanced machine learning techniques
allows to discover dependencies or patterns that
can be overlooked by human operator, especially
in case of complex data. Possible applications
of data dependence analysis are:
Drug - Drug interaction detecting the mutual interactions
of two or more different drugs.
Drug - Genome Variation interaction detecting
the correlations between patient responses to
drugs and his/her genome variation.
Discovery of diagnostics rules
There is a set of verified tools
that can identify 'cause-effect' events based
on data from clinical cases. The outcome is knowledge
in form of simple rules, that can be verified
by human expert. For example a simple rule can
be:
IF ( treatment_by_penicillin > 5days AND body_temperature
> 38C )
OR allergic_to_penicillin
THEN penicillin_treatment_inefficient (with probability
70%)
The algorithms for rules discovery
are significantly fast, an efficient computer
system is able to analyze several hundreds of
parameters over the database of million cases
in few seconds. This makes a rules discovery very
effective analysis tool applied e.g. when searching
for attribute dependencies and the extracted knowledge
should be in simply readable form. |
| |
|
 |
Construction
of Pharmaceuticals Knowledge Bases |
| |
Knowledge about
drug design, drug effects and dependencies is
usually the most valuable information that a pharmaceutical
company posseses. Therefore the proper organization
of this knowledge is an issue of great importance.
A classical database system may not be satisfactory
for this purpose but there exist other technologies
that are specifically designed for knowledge management
and storage, e.g. the ones based on theories of
Bayesian networks (supporting uncertain probabilistic
relations), semantic networks (covering effectively
hierarchical relationships), ontology etc.
|
| |
|
 |
Pharmaceutical
/ Clinical Data Classification |
| |
To classify
means to assign a case (clinical data, a patient
or an observation) to one of specified classes.
The classes are defined by user beforehand therefore
this analysis belongs to 'supervised learning'
techniques. For example a patient can be classified
as a person with high or medium or low heart attack
probability; classification can be applied to
clinical trials data; the system can assist in
tumor analysis or perform grouping of patients
into different segments, do classifications of
chemical combinations as likely or non-likely
drug candidates, do grouping of drugs by their
toxicity etc.
|
 |
Knowledge
Discovery by Data Clustering |
| |
Grouping data
together by their similarity is called clustering.
The clusters are created automatically without
a need to define them beforehand. Therefore a
cluster represents a knowledge uncovered automatically
by clustering algorithm. This 'unsupervised learning'
approach is used in tasks with missing classification
information e.g. unknown characteristics of drugs
or unclassified clinical trials. Our previous
experience with this kind of data analysis led
to the development of our own clustering application
using Kohonen self-organizing maps. We are using
this tool for customer segmentation and other
clustering tasks.
|
 |
Image Recognition
|
| |
Problem of
image analysis and identification is a complex
task that usually relies only on know-how of a
skilled expert. However, there exist several sophisticated
methods that assist an expert when doing image
recognition e.g. methods for pattern recognition,
edge detection, automatic regions coloring, feature
extraction, automatic tissue classification used
for tumor detection.
|
 |
General Biomedical
Data Analysis |
| |
There is a
wide range of analytical methods that can be applied
to data acquired from production process or research
trials. Decision which one to use depends on the
desired purpose, for example identifying data
trends and dependencies, searching for anomalies,
forecasting data values etc. Even the very specific
tasks e.g. gene sequence analysis, allele scoring
or biological signals examination can benefit
from data analysis methods.
|
| |
Reference£º"Introduction
to Data Mining and Knowledge Discovery"
by Two Crows Corporation
|