Recent studies of DNA sequence of letters
A, C, G and T exhibit the inverse power law form 1/f a
frequency spectrum where f is the frequency and a
the exponent1-5. Inverse power-law form of the power spectra
of fractal space-time fluctuations is generic to the dynamical systems
in nature and is identified as self-organized criticality6-9.
In this study it is shown that the power spectra of the frequency distributions
of bases A, C, G, T in the Human chromosome 1 DNA exhibit self-organized
criticality. DNA is a quasicrystal possessing maximum packing efficiency10
in a hierarchy of spirals or loops. Self-organized criticality implies
that non-coding introns may not be redundant, but serve to organize
the effective functioning of the coding exons in the DNA molecule
as a complete unit.
Introduction
DNA topology is of fundamental
importance for a wide range of biological processes11. Since
the topological state of genomic DNA is of importance for its replication,
recombination and transcription, there is an immediate interest to obtain
information about the supercoiled state from sequence periodicities12,
13. Identification of dominant periodicities in DNA sequence will
help understand the important role of coherent structures in genome sequence
organization14, 15. Li16 has discussed meaningful
applications of spectral analyses in DNA sequence studies. Recent studies
indicate that the DNA sequence of letters A, C, G and T exhibit the inverse
power law form 1/f a
frequency spectrum where f is the frequency and a
the exponent. It is possible, therefore, that the sequences have long-range
order1-3, 17-19. Power spectra of fractal space-time
fluctuations of dynamical systems such as fluid flows, stock market price
fluctuations, heart beat patterns, etc., exhibit inverse power-law form
identified as self-organized criticality6 and represent
a selfsimilar eddy continuum. A general systems theory7-9 developed
by the author shows that such an eddy continuum can be visualised as a
hierarchy of successively larger scale eddies enclosing smaller scale eddies.
Since the large eddy is the integrated mean of the enclosed smaller eddies,
the eddy energy (variance) spectrum follows the statistical normal distribution
according to the Central Limit Theorem20. Hence the additive
amplitudes of eddies, when squared, represent the probabilities, which
is also an observed feature of the subatomic dynamics of quantum systems
such as the electron or photon21-23. The long-range correlations
intrinsic to self-organized criticality in dynamical systems are
signatures of quantumlike chaos associated with the following characteristics:
(a) The fractal fluctuations result from an overall logarithmic
spiral trajectory with the quasiperiodic Penrose tiling pattern7-9
for the internal structure. (b) Conventional continuous periodogram power
spectral analyses of such spiral trajectories will reveal a continuum of
wavelengths with progressive increase in phase. (c) The broadband power
spectrum will have embedded dominant wavebands, the bandwidth increasing
with wavelength, and the wavelengths being functions of the golden mean.
The first 13 values of the model predicted7-9 dominant
peak wavelengths are 2.2, 3.6, 5.8, 9.5, 15.3, 24.8, 40.1, 64.9, 105.0,
167.0, 275, 445.0 and 720 in units of the block length 10bp
(base pairs). Wavelengths (or periodicities) close to the model predicted
values have been reported in weather and climate variability8,
prime number distribution24, Riemann zeta zeros (non-trivial)
distribution25, stock market economics26. (d) The
conventional power spectrum plotted as the variance versus the frequency
in log-log scale will now represent the eddy probability density on logarithmic
scale versus the standard deviation of the eddy fluctuations on linear
scale since the logarithm of the eddy wavelength represents the standard
deviation, i.e., the r.m.s (root mean square) value of the eddy fluctuations.
The r.m.s. value of the eddy fluctuations can be represented in terms of
statistical normal distribution as follows. A normalized standard deviation
t=0
corresponds to cumulative percentage probability density equal to 50
for the mean value of the distribution. For the overall logarithmic spiral
circulation the logarithm of the wavelength represents the r.m.s. value
of eddy fluctuations and the normalized standard deviation
t is
defined for the eddy energy as
(1)
The parameter L
in Eq. 1 is the wavelength and T50 is the wavelength
up to which the cumulative percentage contribution to total variance is
equal to 50 and t = 0. The variable logT50
also represents the mean value for the r.m.s. eddy fluctuations and is
consistent with the concept of the mean level represented by r.m.s. eddy
fluctuations. Spectra of time series of fluctuations of dynamical systems,
for example, meteorological parameters, when plotted as cumulative percentage
contribution to total variance versus t follow the model predicted
universal spectrum8.
Data and Analysis
The Human chromosome 1
DNA base sequence was obtained from the entrez Databases, Homo sapiens
Genome (build 30) at http://www.ncbi.nlm.nih.gov/entrez.
The first 10 contiguous data sets consisting of a total number of
9931745
bases were scanned to give a total number of 280 unbroken data sets
of length 35000 bases each for the study. The number of times that
each of the four bases A, C, G, T occur in successive blocks of 10
bases were determined giving 4 groups of 3500 frequency sequence
values for each data set.
The power spectra of
frequency distribution of bases were computed accurately by an elementary,
but very powerful method of analysis developed by Jenkinson (1977)27
which provides a quasi-continuous form of the classical periodogram allowing
systematic allocation of the total variance and degrees of freedom of the
data series to logarithmically spaced elements of the frequency range (0.5,
0). The cumulative percentage contribution to total variance was computed
starting from the high frequency side of the spectrum. The power spectra
were plotted as cumulative percentage contribution to total variance versus
the normalized standard deviation t. The average variance spectra
for the 280 data sets and the statistical normal distribution are
shown in Fig. 1 for the four bases. The 'goodness of fit' (statistical
chi-square test) between the variance spectra and statistical normal distribution
is significant at less than or equal to 5% level for 98.6,
99.3,
98.9,
97.9
percent of the 280 data sets respectively for the four bases A,
C, G and T. The average and standard deviation of the wavelength
T50
up to which the cumulative percentage contribution to total variance is
equal to 50 are also shown in Fig. 1. The power spectra exhibit
dominant wavebands where the normalized variance is equal to or greater
than 1. The dominant peak wavelengths were grouped into 13
class intervals 2 - 3,
3 - 4,
4 - 6,
6 - 12,
12 - 20, 20 - 30,
30 - 50,
50 - 80, 80 –
120,
120 – 200, 200 – 300, 300 – 600,
600 -
1000 (in units of 10bp block lengths) to include the model predicted
dominant peak length scales mentioned earlier. Average class interval-wise
percentage frequencies of occurrence of dominant wavelengths are shown
in Fig. 2 along with the percentage contribution to total variance, i.e.,
the statistical (normal) percentage probability of occurrence, in
each class interval corresponding to the normalised standard deviation
t
(Eq. 1) computed from the average T50 (Fig. 1) for each
of the four bases.
Figure 1: Average variance spectra for the
four bases in Human chromosome 1 DNA. Continuous lines are for the variance
spectra and open circles give the statistical normal distribution. The
mean and standard deviation of the wavelengths T50 up
to which the cumulative percentage contribution to total variance is equal
to 50 are also given in the figure.
Figure 2: Average wavelength class interval-wise
percentage distribution of dominant (normalized variance greater than 1)
wavelengths. Line + open circle give the average and dotted
lines denote one standard deviation on either side of the mean. The computed
percentage contribution to the total variance, i.e., the statistical (normal)
percentage probability of occurrence for each class interval is given by
line
+ star.
Results and Conclusions
The variance spectra for almost all the 280
data sets exhibit the universal inverse power-law form 1/f a
of the statistical normal distribution (Fig. 1) where f is the frequency
and the spectral slope a
decreases with increase in wavelength and approaches 1 for long
wavelengths. The above result is also seen in Fig. 2 where the wavelength
class interval-wise percentage frequency distribution of dominant wavelengths
follow closely the corresponding computed variation of percentage contribution
to the total variance, i.e., the percentage probability of occurrence,
as given by the statistical normal distribution. Inverse power-law form
for power spectra implies long-range spatial correlations in the frequency
distributions of the bases in DNA. Microscopic-scale quantum systems such
as the electron or photon exhibit non-local connections or long-range correlations
and are visualized to result from the superimposition of a continuum of
eddies. Therefore, by analogy, the observed fractal fluctuations of the
frequency distributions of the bases exhibit quantumlike chaos in the Human
chromosome 1 DNA. The eddy continuum acts as a robust unified whole fuzzy
logic network with global response to local perturbations. Therefore, artificial
modification of base sequence structure at any location may have significant
noticeable effect on the function of the DNA molecule as a whole. Further,
the presence of introns, which do not have meaningful code, may
not be redundant, but may serve to organize the effective functioning of
the coding exons in the DNA molecule as a complete unit2.
The results imply that the DNA base sequence
self-organizes spontaneously to generate the robust geometry of logarithmic
spiral with the quasiperiodic Penrose tiling pattern for the internal
structure. The space filling geometric figure of the Penrose tiling
pattern has intrinsic local five-fold symmetry28 and ten fold
symmetry. One of the three basic components of DNA, the deoxyribose is
a five-carbon sugar and may represent the local five-fold symmetry of the
quasicrystalline structure of the quasiperiodic Penrose tiling pattern
of the DNA molecule as a whole. The DNA molecule shows ten fold symmetry
in the arrangement of 10 bases per turn of the double helix. The
study of plant phyllotaxis in Botany shows that quasicrystalline
structure provides maximum packing efficiency for seeds, florets, leaves,
etc29, 10, 30. Quasicrystalline structure of the quasiperiodic
Penrose
tiling pattern may be the geometrical structure underlying the packing
of 103 to 105 micrometer of DNA in
a eukaryotic (higher organism) chromosome into a metaphase structure a
few microns long. The spatial geometry of the DNA is therefore organized
into a hierarchy of helical structures. Such a concept may explain the
observed loops of DNA in metaphase chromosome31. For example,
the average class-interval wise percentage distribution of dominant periodicities
show a peak in the wavelength interval 6-12 in units of 10bp,
i.e. 60 to 120bp for all the four bases (Fig. 2). This predominant
wavelength interval 60 to 120bp may correspond to the coil length
of each of the two DNA coils on the basic nucleosome unit of the chromatin
fibre. Also, the value of T50 ranges from 5 to 6
in units of 10bp, i.e., from 50 to 60bp (Fig. 1) indicating
again the predominance of the fundamental coil length in the double coil
of DNA in nucleosomes. The packing efficiency with respect to length
scale for a circular loop of radius R is equal to the circumference
2pR
divided by the diameter 2R and is equal to p.
Considering successive stages of coiling, the packing efficiency at the
nth
stage of coiling is equal to pn.
A packing efficiency of about 5 orders of magnitude (105
) is obtained at the 10th stage of coiling.
3. Li, W., Marr, T. G., Kaneko, K. Understanding
long-range correlations in DNA sequences. Physica D75(1-3),
392-416 (1994); erratum: 82, 217 (1995). http://arxiv.org/chao-dyn/9403002
5. Stanley H. E., Amaral, L. A. N., Gopikrishnan,
P., and Plerou, V. Scale invariance and universality of economic fluctuations.
Physica
A283, 31-41 (2000).
6. Bak, P., Tang, C., Wiesenfeld, K. Self-organized
criticality: an explanation of 1/f noise. Phys. Rev. Lett. 59,
381-384 (1987).
7. Mary Selvam, A. Deterministic chaos, fractals
and quantumlike mechanics in atmospheric flows. Can. J. Phys. 68, 831-841
(1990). http://xxx.lanl.gov/html/physics/0010046
8. Selvam, A. M., and Fadnavis, S. Signatures
of a universal spectrum for atmospheric interannual variability in some
disparate climatic regimes. Meteorol. & Atmos. Phys. 66,
87-112 (1998). http://xxx.lanl.gov/abs/chao-dyn/9805028
9. Selvam, A. M. and Suvarna Fadnavis. Superstrings,
cantorian-fractal space-time and quantum-like chaos in atmospheric flows.
Chaos,
Solitons and Fractals10(8), 1321-1334 (1999). http://xxx.lanl.gov/abs/chao-dyn/9806002
10.Stewart, I. Daisy, daisy, give your answer
do. Sci. Amer. 272, 76-79 (1995).
11. Bates, A. D. & Maxwell, A. DNA
Topology. Oxford University Press, Oxford, pp.111 (1993).
12. Herzel, H., Weiss, O., & Trifonov,
E. N. Sequence periodicity in complete genomes of Archaea suggests positive
supercoiling. Journal of Biomolecular Structure & Dynamics16(2),
341-345 (1998). http://linkage.rockefeller.edu/wli/dna_corr/1998.html
13. Herzel, H., Weiss, O., & Trifonov,
E. N. 10-11 bp periodicities in complete genomes reflect protein structure
and DNA folding. Bioinformatics15(3), 187-193 (1999). http://linkage.rockefeller.edu/wli/dna_corr/1999.html
16. Li, W. Are spectral analyses useful for
DNA sequence analysis? Proc. DNA in Chromatin, At the Frontiers of Biology,
Biophysics, and Genomics, March 23-29, (2002). Arcachon, France. http://linkage.rockefeller.edu/wli/pub/arcachon02.pdf
18. Audit, B., Vaillant, C., Arneodo, A.,
d'Aubenton-Carafa, Y., Thermes, C. Long-range correlations between DNA
bending sites: relation to the structure and dynamics of nucleosomes. Journal
of Molecular Biology316(4), 903-918 (2002).
27. Jenkinson, A. F., 1977: A Powerful
Elementary Method of Spectral Analysis for use with Monthly, Seasonal or
Annual Meteorological Time Series. Meteorological Office, London, Branch
Memorandum No. 57, pp. 1-23.
28. Devlin, K. Mathematics: The Science
of Patterns. Scientific American Library, NY, p.101 (1997).
29. Jean R. V. Phyllotaxis: A Systemic
Study in Plant Morphogenesis. Cambridge University Press, NY, USA (1994).
30. Mary Selvam, A. Quasicrystalline pattern
formation in fluid substrates and phyllotaxis. In Symmetry in Plants,
D. Barabe and R. V. Jean (Editors), World Scientific Series in Mathematical
Biology and Medicine, Volume 4., Singapore, pp.795-809 (1998). http://xxx.lanl.gov/abs/chao-dyn/9806001
31. Grosveld, F. and Fraser, P. Locus control
of regions. In Nuclear Organization, Chromatin Structure, and Gene Expression.
pp. 129-144. (eds.) Roel Van Driel and Arie P Otte, Oxford University Press
(1997).