Algorithm for Coding DNA Sequences into “Spectrum-like” and “Zigzag” Representations

Jure Zupan* and Milan Randić
National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia 61115
J. Chem. Inf. Model., 2005, 45 (2), pp 309–313
DOI: 10.1021/ci040104j
Publication Date (Web): February 18, 2005
Copyright © 2005 American Chemical Society
*

 Corresponding author phone:  386-1-4760−279; fax:  386-1-4760-300; e-mail:  jure.zupan@ki.si.

Abstract

An algorithm for encoding long strings of building blocks, like 4 DNA bases (adenine - A, cytosine - C, thymine - T, and guanidine - G), 20 natural amino acids (from Alanine Ala to Valine - Val, plus the stop triplet), or all 64 possible base triplets (from AAA to TTT), into “zigzag” or “spectrum-like” representations is suggested. The new encoding scheme can be derived in the 3-, 2-, or 1-dimensional form depending on the user's wishes. The only information, besides the string for which the “spectrum-like” representation is sought, is the initial positioning of the complete set of units from which the string is composed, i.e., four positions for A, C, G, and T, or 20 positions for natural amino acids plus stop, etc. This initial positioning can be initialized in either the 3-, 2-, or 1-D form. As an illustration of the suggested encoding scheme of the visual and chemometric comparison of the first 10 exon strings of the beta globin gene of 10 different species, each string consisting of about 100 basic amino acids long is shown.

Tools

History

  • Published In Issue March 28, 2005
  • Received November 9, 2004

Recommend & Share

Related Content

Other ACS content by these authors: