Module 6 : Bioinformatics tools

Lecture 38 : Analysis of Protein and Nucleic acid sequences -I

Biological Databases- In the post genomic era, nucleotide and protein sequences from different organisms are available. It has paved the determination of secondary and 3-D structure of the proteins as well. This vast amount of information is processed and arranged systematically in different biological databases. The information present in these databases can be used to derive common feature of a sequence class and classification of a unknown sequence.

Primary Database- This the collection of the data obtained from the experiment such as sequence of DNA or Protein, 3-D structure of a protein.

Database of nucleic acid sequences

GenBank- This is a public sequence database and it can be accessed through a web addess http://www.ncbi.nlm.nih.gov/genbank/. The entry into the genbank is made through a login into the database with a pre-requisite of publication of the new sequence in any scientific journal. Each entry in the database has a unique accession number and it remains unchanged. A sample GenBank entry can be accessed via a link http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html. A typical GenBank entry has the information about the locus name, length of the sequence, type of the molecule (DNA/RNA), nucleotide sequence of the entry.

Entrez- Entrez system is used to search all NCBI associated databases. It is a powerful tool to peform simple or complicated searches by combining key word with the logical operator (AND, NOT). For example, searching a protein kinase sequence in human can be done by the following search syntax: Homo sapiens [ORGN] AND protein kinase.

EMBL and DDBJ- EMBL is the nucleotide sequence database present at European bioinformatics institute where as DDBJ is the DNA sequence database present at centre for information biology, Japan. EMBL can be accessed at http://www.embl.de/ where as DDBJ canbe accessed at http://www.ddbj.nig.ac.jp/. Everyday, GenBank, EMBL and DDBJ synchronize their nucleotide sequence and as a result searching of a nucleotide in any of the database is sufficient.

Database of protein sequences

SWISSPROT- it is the collection of the annoted protein sequence of the swiss instituite of bioinformatics (SIB). SWISSPROT can be accessed at http://web.expasy.org/groups/swissprot/. The protein sequence entry in the swissprot is manually curated and if required it is compared with the available literature. Swissprot is part of the UniProt database and collectively known as UniProt Knowledgebase. A ‘niceprot' view of the entry in swissprot database are graphically presented for better readability and hyperlinks are given for other databases as well.