Thesis (Ph.D., Bioinformatics & Computational Biology) -- University of Idaho, 2015 | DNA sequencing technologies address problems, the solutions of which were not possible before, such as whole genome sequencing or microbial community characterization without pre-cultivation. Current High-Throughput Sequencing (HTS) techniques allow genomic studies in small labs as well as in large genomic centers. Together with modern computational software, HTS becomes a powerful tool, which allows researchers to answer important biological questions in novel ways.
Despite the advantages of modern HTS technologies, large amounts of data and accompanying noise in HTS library confound bioinformatic analysis. Data preprocessing is needed in order to prepare data for subsequent analysis. Data preprocessing includes noise removal as well as techniques such as data reduction.
In this dissertation I present a set of software tools that may be used in genomic studies in order to prepare HTS data for subsequent bioinformatic analysis. The first two chapters in this dissertation describe preprocessing tools developed for data denoising. In the last two chapters I explore the use of multiple genomic markers in 16S data analysis with a meta-amplicon analysis algorithm, which facilitates usage of all the information that can be obtained with 16S amplicon sequencing. Meta-amplicon analysis represents improvements on current methods used to characterize bacterial composition and community structure.