Research Projects
1. Estimation
of Volterra Kernels of Physiological Systems Using Meixner Functions, RAC #
2020-018 (M. H. Asyali, PI; M. Juusola, co-I, Cambridge University, Physiology
Lab)
Description: In this study, we explored the
possibility of using of Meixner basis functions, instead of widely known/used
Laguerre basis functions, in estimation of Volterra kernels of physiological
systems using least squares minimization. We compared kernel estimation
performance of Meixner and Laguerre functions in some test cases that we
constructed and in an experimental case where we studied photoreceptor
responses of photoreceptor cells of adult fruitflies (Drosophila melanogaster).
Our results indicate that when there is a slow initial onset or delay, Meixner
basis function expansion provides better kernel estimates.
Progress:
The results obtained in the simulation
study indicate that using Meixner basis functions is advantageous over
Laguerre basis functions especially when there is a delay in the kernels. Our
experimental results support the findings of the simulated data. Again,
Meixner basis functions give better estimates of Volterra kernels than those
of Laguerre basis functions. This is judged by (1) controlled, virtually
oscillation-free onset of the estimated kernels (2) universally lower norm of
estimation error at different noise conditions, (3) more meaningful behavior
as correlated to the known biophysical factors.
In future,
we will apply our proposed technique on some more experimental datasets and
publish our findings. We have developed a standalone Windows™ application that
does Volterra kernel estimation, up to 3rd order, using Meixner basis
functions. We are announcing the availability of this new modeling tool in a
paper that will be published in the IEEE Transactions Biomedical Engineering.
We hope to receive many requests for this tool. This will bring scores of
credit to our institution and also will help us initiate collaborations.
Papers:
Musa H. Asyali
and Mikko Juusola, “Use of Meixner Functions in Estimation of Volterra Kernels
of Nonlinear Systems with Delay,” submitted to IEEE Trans. Biomed. Eng.
2. Diagnostic
Power of Different Heart Rate Variability Measures in Detecting Cardiac Health
Condition, RAC # 2030 032 (M. H. Asyali, PI)
Description:
Heart Rate Variability (HRV) can be assessed by time- or frequency-domain
methods. The time-domain HRV measures are based on beat-to-beat intervals
whereas frequency-domain analysis expresses HRV in terms of its constituent
frequency components. HRV analysis has emerged as a diagnostic tool that
quantifies the functioning of the autonomic regulation of the heart and
heart’s ability to respond. However, majority of studies on HRV report several
different time and frequency domain HRV measures together, which may be
redundant and confusing in many cases. The question of which HRV measures are
the strongest overall indicators of the cardiac condition has not been
addressed.
Progress:
In this study, we used data obtained from the PhysioBank, an online
physiological data repository maintained by the PhysioNet (the research
resource for complex physiologic signals, established under auspices of NIH,
http://www.physionet.org). We computed of 9 different commonly used long-term
HRV measures from 52 normal subjects and 22 patients with congestive heart
failure. Subsequently, using the methods used in linear discriminant analysis,
we investigated the class, i.e. normal versus abnormal, discrimination power
for those HRV measures and identified the one that indicates the cardiac
condition with higher sensitivity and specificity. Our results revealed that
the HRV measure known as the SDNN (standard deviation of all normal-to-normal
beat intervals), which is one of the simplest measures to compute and
interpret, has the highest class discrimination power. A Bayesian (i.e.
minimum error rate) classifier based on this index achieved sensitivity and
specificity rates of 81.8% and 98.1% respectively.
Thus far, we
only focused on the long-term HRV measures. As information about the sleep or
physical activity status of the subjects was not available, we could not
compare the class discrimination power of the short-term measures. In a future
study, we are planning to collect our own data, which will enable us to
confirm the results of this study and make further assessments regarding the
short-term HRV measures.
Papers:
M.H. Asyali,
“Discrimination Power of Long-Term Heart Rate Variability Measures,”
Proceedings of the 25th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, Cancun, September 17-21, 2003.
3. Estimation
of Signal Thresholds for Microarray Data Using Mixture Modeling, RAC # 2030
030 (M. H. Asyali, PI; M. M. Shoukri, co-I; O. Demirkaya, co-I)
Description: DNA
microarray is an important tool for the study of gene activities but the
resultant data consisting of thousands of points are error-prone. A serious
limitation in microarray analysis is the unreliability of the data from low
signal intensities, which generally constitute a large portion of the
microarray data. Such data may produce erroneously high gene expression
ratios, i.e. false positives, and result in unnecessary validation or
post-analysis follow-up tasks. In this study, we introduce a solid statistical
approach based on normal mixture modeling and Bayesian (minimum error rate
classification) theory for determining optimal signal intensity thresholds to
eliminate false positives and keep maximum possible number of reliable
measurements of the array elements that is adaptable to any microarray data.
Progress:
We used univariate and bivariate mixture modeling to segregate the microarray
data into two classes, i.e. low (or unreliable) and high (potentially
reliable) signal intensities, and applied Bayesian decision theory to find the
optimal signal-thresholds that will minimize the probability of error, i.e.
misclassification rate. We compared and assessed the accuracy of our approach
with respect to a conventional method by using a reference set of gene
expression data that contains only true negative and positive elements.
This study
has two tracks running in parallel. In one track we are striving to
disseminate our findings in the form a publication at a prestigious journal
and in the other track we are developing a software application that
implements the methods we developed in the course of the study. We are in the
process of patenting our microarray data filtering methods and the associated
software to protect our propriety rights. We are planning to make our software
available free of charge for academic use. We hope to receive many requests
for this tool. This will bring a lot of credit to our institution and also
will help us initiate collaborations.
Papers:
M.H. Asyali, M.M.
Shoukri, O. Demirkaya, and K.S.A. Khabar, “Estimation of Signal Thresholds for
Microarray Data Using Mixture Modeling” submitted to the Proceedings of the
National Academy of Sciences.
4. Design
of Optimal Sampling Times in Bioequivalence Studies Using Computer
Simulations, RAC # 2021 025 (Naser Elkum, PI; Musa H. Asyali, co-PI; M.M.
Shoukri, co-PI)
Description: In bioequivalence (BE)
studies, pharmaceuticals to be compared are administered to subjects and blood
samples are collected and a concentration time curve (CTC) is constructed to
estimate several pharmacokinetic (PK) parameters. As the PK parameters are
estimated from a limited number of samples, the timing of the samples directly
influences the accuracy of estimation. Optimization of the sampling times may
not only increase the accuracy of PK parameter estimation and consequently
lead to more reliable BE decisions, but also reduce the number of samples to
be drawn, which in turn lessens the inconvenience to the subjects and the cost
of the study.
Progress: In this study, a cubic spline
approximation based method for PK parameter estimation is suggested and
optimization is done by simultaneously considering all the PK parameters used
in BE decisions. It is shown that, with the proposed approach, it is possible
to obtain accurate PK parameter estimates with only a few samples.
In future,
we will assume that all the model parameters come from a suitable realistic
joint lognormal density whose parameters are determined/known from earlier
tests. We will generate simulated data for the underlying pharmacokinetic
model parameters (i.e. different absorption rate, elimination rate, and volume
combinations) from the assumed multivariate density and obtain the optimal
sampling times for each case. Then, following the procedure we have introduced
in this study, we will study the characteristics of the optimal sampling
intervals that can be suggested for a population.
Papers:
a.
M.H. Asyali, N. Elkum, and M.M.
Shoukri, “Design of Optimal Sampling Times in Bioequivalence Studies via
Spline Approximation” submitted to the Journal of Pharmacokinetics and
Pharmacodynamics.
b.
M.H. Asyali and N. Elkum,
“Optimization of Sampling Time Designs in Bioequivalence Tests: A Comparison
of Techniques,” The 2nd International Eastern Mediterranean Region
Biannual Conference, 2003, International Biometric Society, Antalya, 12-15
January 2003. (Abstract)
c.
N. Elkum and M.H. Asyali, “Design of
Optimal Sampling Times in Bioequivalence Studies: A Simulation Approach,” The
2nd International Eastern Mediterranean Region Biannual Conference,
2003, International Biometric Society, Antalya, 12-15 January 2003.
(Abstract)
5. Modeling
Correlated Data from Cluster Randomization and Observational Studies, RAC #
990 011 (M.M. Shoukri, PI; M.H. Asyali, co-I)
Description:
In this study, to avoid the problems associated with the approximate methods
in the analysis of multilevel correlated data, we suggest an exact modeling
procedure. We consider a Poisson random effects model where the mixing
distribution is the inverse-Gaussian.
Progress:
We developed a regression model, which
relates the number of mastitis cases in a sample of dairy farms in Ontario,
Canada, to various farm level covariates, to illustrate the methodology.
Residual-normal plots are constructed to explore the quality of the fit. We
compared the results with a negative binomial regression model using maximum
likelihood estimation, and to the generalized linear mixed regression model
fitted in SAS.
We are
planning to apply the methodology demonstrated in this study onto some other
clustered correlated data sets and assess/compare its performance with respect
to other data modeling schemes by comparing model predictions with the actual
data. This will enable us further identity cases in which using a Poisson
inverse-Gaussian model is advantageous.
Papers:
a.
M.M. Shoukri, M.H. Asyali, R.
VanDorp, and D. Kelton, “The Poisson Inverse Gaussian Regression Model in the
Analysis of Clustered Counts Data,” Journal of Data Science, in press (will
appear in Vol. 2., No.1, Jan. 2004).
b.
M.M. Shoukri and M.H. Asyali,
“Analysis of Clustered Count Data Using Poisson Inverse Gaussian Regression,”
Eastern Mediterranean Region Biannual Conference, 2003, International
Biometric Society, Antalya, 12-15 January 2003.
(Abstract)
6. Planning
a Reliability Study: Cost and Efficiency Consideration, RAC # 2011 063, (M.M
Shoukri, PI; M.H. Asyali, co-I)
Description:
A crucial decision that a researcher faces in the design stage of a
reliability study is the determination of the number of subjects k and
the number of measurements per subject n. When we have prior knowledge
of what constitutes an acceptable level of reliability, a hypothesis testing
approach may be used, and the sample size calculations can then be performed
using methods suggested in previous studies. However, in most cases, values of
the reliability coefficient under the null and alternative hypotheses may be
difficult to specify. For instance, the estimated value of intraclass
correlation coefficient (ICC) depends on the degree of heterogeneity among the
sampled subjects: the greater the heterogeneity, the higher the value of ICC.
Since most reliability studies focus on the estimation of ICC with sufficient
precision, the guidelines provided in this paper, which we based on principles
of mathematical optimization, allow an investigator to select the pair (n,
k) that maximizes the precision of the estimated reliability index. Our
proposed approach is quite simple and produces estimates of (n, k)
that are in close agreement with results based on considerations of power.
Progress:
An interesting finding from our results is that, regardless of whether the
assessments are continuous or binary, the variance is minimized with a small
number of replicates, as long as the true index of reliability remains
reasonably high. In many clinical investigations, reliability of at least 60 %
is required in order to provide method of measurement that has practical
utility. Under such circumstances, one can safely recommend making only two or
three observations per subject.
In many
medical screening programs, and in social sciences and psychology studies, it
is often more feasible to record the subject’s response on a dichotomous scale
(such as presence/absence). If this approach is adopted, the issue of optimal
allocation becomes very important, because research has demonstrated that the
loss of power associated with measuring the trait on a dichotomous scale is
quite severe, and frequently prohibitive. We therefore intend to investigate
and report on this important issue, i.e. cost implications for dichotomous
assessments.
Papers:
M.M. Shoukri, M.
H. Asyali, and S.D. Walter, “Issues of Cost and Efficiency in the Design of
Reliability Studies,” Biometrics, Vol. 59, No. 4, 2003, pp.1109-1114.
7. Sample
Size Requirements for the Design of Inter-Observer and Intra-Observer
Agreement Studies: A Review and Some New Results, RAC # 2030 036 (M.M Shoukri,
PI; M.H. Asyali, co-PI)
Description:
In this study, we revisited the
literature on sample size requirements when interest is focused on estimating
the intraclass correlation coefficient (ICC) reliability from a single sample
of subjects. A crucial step in the design and analysis of biomedical
experiments is the determination of the sample size and this issue is of
particular importance in the design of reliability studies.
Progress:
We derived the optimal allocation of
the number of subjects k and the number of repeated measurements n
that minimize the variance of the estimated ICC. We also looked into cost
constraints for the normally and non-normally distributed responses. We
produced tables showing optimal choices of k and n along with
the guidelines for the design of reliability studies in light of our results
and those reported by others.
In practice
the optimal allocations must be integer values, and that the net loss/gain in
precision as a result of rounding the values of (n, k) is
negligible. Ideally one should adopt one of the available combinatorial
optimization algorithms, often referred to as integer programming models.
These models are suited for the optimal allocations problems that we reviewed
in this study since the main concern was to find the best solution(s) in a
well-defined discrete space. This topic needs further investigation.
Papers:
M.M. Shoukri, M.H.
Asyali, and A. Donner, “Sample Size Requirements for the Design of Reliability
Study: Review and New Results,” Statistical Methods in Medical Research (in
press).
8. Automated
Segmentation of Microarray cDNA Spots, RAC # 2030 031 (O. Demirkaya, PI; M.H.
Asyali, co-PI)
Description:
Segmentation or separation of spots from the background in cDNA microarray
images is one of the earlier steps in gene expression data analysis.
Performance of the segmentation method may profoundly impact the performance
of the subsequent stages of data extraction and analysis. Several methods have
already been suggested to segment microarray spots. In this study, we propose
a new approach based on the Markov random field modeling of the microarray
spot regions. Initial parameters were estimated using an entropy-based
thresholding algorithm.
Progress:
The proposed method was first
validated on simulated images, and then applied to actual microarray images.
Our preliminary results indicate that the method performs well.
In a future
study will also include the validation of the proposed method on simulated
images. We have already developed a method to simulate realistic microarray
images with reference spot regions.
Papers:
a.
O. Demirkaya and M.H. Asyali,
“Automated segmentation of Microarray cDNA Spots Using Thresholding
Algorithms” submitted to the Bioinformatics Journal.
b.
O. Demirkaya, M.H. Asyali, “A
Measure of Image Bimodality: Between-Class Variance” submitted to Pattern
Recognition Letters.
c.
O. Demirkaya, M.H. Asyali, M.M.
Shoukri, and K.S. Abu-Khabar, “Segmentation of Microarray cDNA Spots Using MRF-Based
Method,” Proceedings of the 25th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, Cancun, September 17-21, 2003.
(Conference Paper)