Interpro- SwissProt, TrEMBL, Prosite, Pfam, PRINT, ProDom, Smart and TIGRFAMS are integrated into a comprehensive signature database known as Interpro. The results from interpro gives the output from individual databases and allows user to compare the output considering the algorithm used in each database.
Molecular structure database
Protein Data bank (PDB)- it is the collection of the experimentally determined crystal stuture of the biological macromolecules. It is co-ordinated by the consortium located in Europe, Japan and USA. As of August 2013, the database contains 93043 structures which includes protein, nucleic acids, and protein-nucleic acid or protein-small molecule complexes (http://www.rcsb.org/pdb/home/home.do). A PDB ID or the key word can be use to search the database. The result from the database summarizes all information related to the structure such as crystallization condition, reference of the journal article where the finding are published etc.
SCOP- SCOP (structural classification of protein) utilizes the basic idea that the proteins with similar biological functions and evolutionary related with each other must have a similar structure. The database classifies the structure of a known protein into the families, superfamilies and fold. A protein structure belongs to a famiy if the sequence identity must be atleast 30% over the total length of the sequence. Proteins with structural or functional similarity but low sequence identity are classified into the superfamilies. Whereas proteins with similar secondary structure arrangement belongs to the fold.
CATH- Similar to SCOP, CATH classifies the protein into 4 categories: Class (C), Architecture (A), Topology (T), and Homologous superfamily (H). A protein is classified as Class depending on the proportion of the secondary structure elements rather than their arrangement. There are 4 classes, helices (α-class), sheet (β-class), helix-sheet (α/β class) and proteins with few secondary structures. The arrangement of secondary elements in a protein structure is used for their classification within the architecture. The connection of secondary elements is used for their classification within the topology category. The homologous superfamily consider the presence of similar domains in two protein structure for their classification.
Sequence Comparison
Homologous- Two related sequences are termed as homologous to each other. These can be either orthologs or paralogs. The homologous protein from two different organsism with similar functions are termed as ortholog where as homologous protein with different protein with different function in an organism is called as paralog.