Shumaker Endowed Associate Professor, Director
Bioinformatics, MU Informatics Institute
Dr. Shyu has concentrated his informatics research efforts in three major areas – visual phenotypes to genotypes correlation, large-scale genomic data mining and retrievals, and computational structure biology. His research group has developed several unique bioinformatics tools for protein tertiary structure retrievals (ProteinDBS), repetitive/highly conserved sequence retrieval system (ACMES), and plant visual phenotypes/genotypes retrieval system (VPhenoDBS).
Real-Time Protein Tertiary Structure (3D) Retrievals and Classifications
Protein fold is known to be an important clue of detecting possible biological functions. The study of the structure-to-function relationships usually relies on an effective protein structure retrieval and classification method. The task of protein structure retrieval compares a query structure and each known proteins from a database and returns the ones with high similarities. The classification of protein structures categorizes and annotates a newly-discovered protein to possible folds, which could be relevant to the functional properties. With efforts of Structural Genomics (SG) projects, a large amount of protein structures has been identified in recent years via the high-throughput structural determination techniques such as X-ray crystallography and nuclear magnetic resonance (NMR). In the future, more new structures could be solved. To meet the needs of retrieving and classifying these high-throughput protein data, the research activities of this project are designed to face four central challenges.
1) To compare globally similar 3D tertiary structures using content-based information retrieval (CBIR) and high-dimensional indexing techniques in real time.
2) To efficiently classify newly-discovered proteins into the fold hiereachy of the Structural Classification of Protein (SCOP) database based on the structural similarity.
3) To fast retrieve locally similar protein substructures with the non-contiguous structural core identifications in a large-scale protein database.
4) To fuse the retrieval and classification results from different structure cores and provide suggestions to assist the functional predictions.
The proposed system will be the first in the research community that allows a life science researcher or an educator to submit an unknown protein tertiary structure and ask, "What proteins in Protein Data Bank (PDB) have similar non-contiguous structure cores to the query protein?" or "Which fold of SCOP database maintains similar 3D structures to the query protein?"
Visual Phenotype Database
Discoveries in biology often require extensive knowledge of the genetics of an organism, a keen eye for phenotypes, a deep understanding of related species, and efficient strategies for collecting, combining, analyzing, and comparing data. Currently, public database tools that retrieve phenotypic and genomic information allow only relatively simplistic queries, and viable software tools to capture, parse, and return information from digital images are lacking. We hope to enable biologists to simultaneously query phenotype data by image example, sequence, ontology, genetic and physical map information, and text annotations by developing the first web-based visual phenotypic information management system to allow such complex queries.
The database framework will consist of five modules:
(1) A system to extract and quantify low-level features from phenotypic images
(2) A high-dimensional database indexing system to manage and cluster images for real-time retrievals
(3) A linking hub to correlate visual features already attributed to a given locus with relevant genetic and physical maps
(4) A text mining and ontology utilization system for parsing annotations
(5) A results visualization system.