The IBEST Bioinformatics Core currently comprises several compute clusters, stand-alone application servers, data storage systems, software, and personnel. Our primary production cluster is made up of Dell M1000E enclosures and M605 blades. It has a total of 512 cores (AMD64) and 512GB of total system memory (1GB per core). In addition to our primary cluster we maintain a 96 processor Intel Xeon based system with 48GB total system memory (512MB per processor) and a 96 processor PowerPC G5 based system with 192GB total system memory (2GB per processor). We also maintain a cluster primarily used for testing and development, which is made up of 44 Intel Xeon processors and 22GB of system memory (512MB per processor). In addition to the research clusters we have a small cluster (datarig) made up of Dell PE2950 and PE1950 servers which dedicated to the post-processing of 454 sequencing data. It is a 40 core Intel Xeon cluster with 96GB of total system memory. The clusters are currently networked with 1Gb/s TCP interconnects.The stand-alone application servers include 3 Dell M905's each with 16 cores and 32GB of system memory, 2 Dell PE6950's each with 8 cores and 8GB of system memory and 2 dual processor Sun SPARC V440's.We support over 85TB of total data storage and backup. Our LTO-4 tape backup system is capable of backing up 20TB of data. Our main production cluster has 30TB of dedicated storage, the 454 datarig has 15TB of dedicated storage, and the remaining production and development clusters split the remaining 20TB of data storage. All user data is backed up regularly.
The core systems are located on the University of Idaho campus in a 1400 square foot room that has been specifically designed and renovated by UI for this Core. 1GB fiber and copper connect all equipment, and the UI backbone provides 4GB/s transfer rates. This room has a dedicated 80KVa UPS with three phase power and four forced air handlers attached to redundant university chilled water systems. The facility has an emergency backup diesel generator.The bioinformatics core is connected to the university backbone with 1Gb/s fiber and provides 1Gb/s networking to the faculty offices and laboratories. Also, the University of Idaho, funded in part through the $10M NIH Lariat infrastructure grant, has expanded off campus data transmission capacity to 2.8Gb/s in the short term, and will expand to 10Gb/s within 3 years. This will enable large, high-speed data transfer with the rest of the world, rather than just within the university. This is important for both collaborations and for systems support, since keeping our many huge databases up to date requires constant transmission of vast amounts of data from primary database providers such as NCBI.
A wide array of software is available for general sequence analysis, phylogenetic and population genetics analyses, protein structure modeling, expression array analysis, statistics and mathematical modeling. The software available on these computers include: General Sequence Analysis Packages (EMBOSS, etc.), Database Access (PDB, SCOP, GenBank, etc.), Phylogenetic Inference (PHYLIP, PAUP*, MrBayes, fastDNAml, GeneTree, MODELTEST, P4, PAML, Seq-Gen, TreeView), Population Genetics (Migrate, Fluctuate, Recombine, Lamarc, GeneConv), Sequence Alignment (HMMER, ClustalW, mafft, muscle, etc.), Sequence Assembly (Phred/Phrap/Consed, RepeatMasker), Protein Structure Visualization (Amber, Charmm, Cn3D, Rasmol, 3D Molecular Viewer), and Statistical/Mathematical Packages (Mathematica, MatLab, R, S3 Stochastic Spatial).Most of these programs are free for academic use, while others are commercial packages or have been developed by COBRE students and personnel. The latter includes new software (EVALYN) for multiple sequence alignments, a fast program (ClearCut) for inferring phylogenetic trees that is based on a modified neighbor joining method, a program for high throughput analysis of ribosomal RNA gene sequences (HiTSA), and a companion program (StatGen) that summarizes and graphically displays the results from HiTSA, and the Microbial Community Analysis (MiCA) for analyzing TRFLP data about bacterial communities. In addition, tools to facilitate data analysis have been developed including an 'all-against-all' BLAST, a tool for detecting transposable elements in genomes that uses RepeatMasker, as well as tools for distributed PAUP and bootstrap analysis. Each of these software and data analysis tools is freely available to researchers anywhere.