Thesis (Ph.D., Bioinformatics & Computational Biology) -- University of Idaho, 2016 | Background: Statistics is a key component of bioinformatics, which provides crucial insight into biological processes, such as testing genetic association with the risk of complex human diseases and variation of drug response. A lack of statistical power due to small sample size in genetic association studies increases the probability of type II error, and the determination of the correct sample size for these studies is influenced by various biological parameters. Additionally, multiple hypothesis testing, which is common in genetic association studies, leads to type I error inflation.
Objective and Methods: This study focused on statistical properties that are important in genetic association studies: 1) testing effects of biological factors on sample size estimation by regression analysis; 2) developing a two-stage Bonferroni type I error correction procedure using linkage disequilibrium (LD) to define independent haplotype blocks; and 3) adjusting alpha levels in sample size estimation based on LD structure among genetic markers in different racial groups.
Results: The first study showed that a recessive genetic model requires the largest sample size; the most significant factors for sample size estimation were minor allele frequency under the recessive genetic model, and genetic effect size under dominant and additive genetic models. The two-stage adjusted Bonferroni correction was less conservative than the standard Bonferroni correction, but less liberal than FDR. Sample sizes estimated using an adjusted alpha level based on LD structure could be reduced by 14% to 24% depending upon racial group, compared with the standard Bonferroni adjustment for alpha level.
Conclusion and implication: Genetic inheritance model, effect size, and allele frequency significantly impact sample size estimation. The results can be applied to genetic marker selection, sample size estimation, and statistical power prediction. The two-stage adjusted Bonferroni type I error correction procedure improves statistical power, and introduces a simple way to control for type I error in genetic association studies. Using LD structure across the tested DNA region to adjust the alpha value for sample size estimation by race can reduce the required total sample sizes, improve statistical power, and lead to cost-effective outcomes.
Keywords: Genetic association study; Sample size estimation; Statistical power; Genetic effect; Genetic inheritance model; Linkage disequilibrium; Type I error inflation; Bonferroni type I error correction; Haplotype block; FDR.