Notl linking clones contain sequences flanking Notl recognition sites and were previously shown to be tightly associated with CpG islands and genes. To directly assess the value of Notl clones in genome research, high density grids with 50,000 Notl linking clones originating from six representative Notl linking libraries were constructed. Altogether, these libraries contained nearly 100 times the total number of Notl sites in the human genome. A total of 3437 sequences flanking Notl sites were generated. Analysis of 3265 unique sequences demonstrated that 51% of the clones displayed significant protein similarity to SWISSPROT and TREMBL database proteins based on MSPcrunch filtering with stringent parameters. Of the 3265 sequences, 1868 (57.2%) were new sequences, not present in the EMBL and EST databases (similarity ≤ 90%). Among these new sequences, 795 (24.3%) showed similarity to known proteins and 712 (21.8%) displayed an identity of > 75% at the nucleotide level to sequences from EMBL or EST databases. The remaining 361 (11.1%) sequences were completely new, i.e. < 75% identical. The work also showed tight, specific association of Notl sites with the first exon and suggest that the so-called 3' ESTs can actually be generated from 5'-ends of genes that contain Notl sites in their first exon.
|Original language||English (US)|
|Number of pages||5|
|Journal||Nucleic acids research|
|State||Published - Apr 1 2000|
ASJC Scopus subject areas