Sequence repeats are the simplest form of regularity and the detection of repeats is important in biology and medicine as it can be used for phylogenic studies and disease diagnosis. A major difficulty in identification of repeats is caused by the fact that the repeat units can be of unknown length and either exact or imperfect, in tandem or dispersed. Many of the methods for detecting repeated sequences are part of the digital signal processing (DSP) field. These methods involve a transformation which has as main goal the mapping of the symbolic domain into the numeric domain without adding structure information to the symbolic sequence beyond that inherent to it. Therefore, the numerical representation of genomic signals is very important. This paper presents the results obtained by using different numerical representations (including two novel) and spectral analysis to isolate the position and length of DNA repeats in short sequences containing microsatellites and on long sequences with alpha DNA repeats.
genomic signal processing, sequence repeats, DNA representations, Fourier analysis, spectrograms.
Petre G. Pop, Alin Voina, "Numerical Representations Involved in DNA Repeats Detection Using Spectral Analysis", Studies in Informatics and Control, ISSN 1220-1766, vol. 20(2), pp. 163-180, 2011. https://doi.org/10.24846/v20i2y201109