Module 6 : Bioinformatics tools

Lecture 39 : Analysis of Protein and Nucleic acid sequences -II

Secondary Database- The analysis of the primary data gives rise to the development of secondary database. Secondary structures, hydrophobicity plot and domains are present in the various secondary databases.

Prosite- Prosite is one of the secondary biological database which contains motifs to classify the unknown sequence into the protein family or class of enzyme. It can be accessed with the web address http://prosite.expasy.org/. The database contains motifs derived from the multiple sequence alignment. The quert sequence is aligned against the multiple sequence alignment to determine the presence or absence of the motif. A typical expression in prosite has seven amino acid positions. For examples, [EFTNA]-[HFDAS]-[HYT]-{ADS}-X (2)-P. This expression can be understood as follows-

1st position can be E, F, T, N or A

2nd position can be H, F,D,A,S

3rd position can be HYT

4th position can be any amino acid except ADS

5th and 6th position, any amino acid can follow and the 7th position will be proline.

A query sequence can be analyzed using the algorithm ScanProsite. In addition, it may allow to search the sequence with similar pattern in SwissProt, TrEMBL and PDB databases.

PRINTS:

Pfam: The Pfam database contains the profiles of the protein sequences and classifies the protein families as per the over-all profile. A profile is a pattern of the amino acid in a protein sequence and determine probability of a given amino acid. Pfam is based on the sequence alignment. A high quality sequence alignment gives the idea about the probability of appearance of an amino acid at a particular position and contain evolutionary related sequences. However, in few cases a sequence alignment may have sequences with no evolutionary relationship to each other. A critical analysis of result from the Pfam database is necessary to draw conclusions.