RSDB: representative sequence databases with high information content.

Cited 32 time in webofscience Cited 35 time in scopus
  • Hit : 370
  • Download : 0
Motivation: Biological sequence databases are highly redundant for two main reasons. 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences call be removed before the databases start losing homology information. Can a database of sequences with mutual sequence identity of 50% or less provide us with the same amount of biological information as the original full database ? Results: Comparisons of nine representative sequence databases (RSDB) derived from full protein databanks showed that the information content of sequence databases is not linearly proportional to its size. An RSDB reduced to mutual sequence identity of around 50% (RSDB50) was equivalent to the original full database irt terms of the effectiveness of homology searching. It was a third of the full database size which resulted in a six times faster iterative profile searching. The RSDBs are produced at different granularity for efficient homology searching. Availability: All the RSDB files generated ann the full analysis results are available through internet: ftp://ftp.ebi.ac.uk/pub/contrib/jong/RSDB/ http://cyrah.ebi. ac.uk:1111/Proj/Bio/RSDB Contact: jong@biosophy/org.
Publisher
Oxford Univ Press
Issue Date
2000-05
Language
English
Article Type
Article
Keywords

HIDDEN MARKOV-MODELS; ALIGNMENTS; SEARCH; SENSITIVITY; FAMILIES; PFAM

Citation

BIOINFORMATICS, v.16, no.5, pp.458 - 464

ISSN
1367-4803
DOI
10.1093/bioinformatics/16.5.458
URI
http://hdl.handle.net/10203/72657
Appears in Collection
RIMS Journal Papers
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 32 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0