
Pub Medic:
Impress your mates at the pub
with your startling repertoire
of esoteric medical knowledge.
Junk DNA
Sequencing the human genome has revolutionised the way
molecular biologists look at our DNA. And junk DNA, material once thought
of as irrelevant genetic waste, is becoming a focus of scientific interest,
as Raghav Chawla explains
Today, we are learning the
language in which God created life," US President Bill Clinton said
on 26 June 2000. The initial sequencing of the human genome had
historically been completed.w1
Eight months on, two research teams published their
draft versions of the sequence.w2 w3 In an article accompanying the
publication in Nature, David Baltimore, remarked, "For conceptual impact, it does
not hold a candle to Watson and Crick's 1953 paper describing the
structure of DNA. Nonetheless, it is a seminal paper, launching the era of
post-genomic science." w4 w5
Our genome
The human genome consists of about three billion base
pairs, packaged into 23 pairs of chromosomes. Only 1% codes for
protein - "coding DNA."w6 The remaining 99% is non-coding
DNA; it is half repetitive and half non-repetitive sequences.w7
Most repetitive sequences are scattered randomly
throughout the genome. They arise through segmental duplication or, more
commonly, through transposition (box 1).w8 Some may carry pseudogenes,
degenerated non-functional genes. Conversely, a limited number of repeats
appear to be clustered in tandem. Many of these "DNA
satellites" are in centromeric and telomeric chromosomal regions.w6
Box 1: Transpositionw6
Transposition, in which DNA sequences, referred to as
transposons or "jumping" DNA, are excised or copied from one
location of a chromosome and reinserted into another location of the same
or a different chromosome. Depending on the type of transposition
intermediate, these sequences are further subdivided into DNA or RNA
transposable elements. The latter, also called retrotransposons, are common
in the human genome and include long interspersed elements, short
interspersed elements, and long terminal repeats.
Non-repetitive sequences are thought to have
originated from transposable elements. Their derivation is said to have
become unrecognisable through the occurrence of numerous mutations.w6
The concept and number of genes
Until the 1970s, a gene was thought to consist of
single continuous chains of nucleotides that was transcribed into a single
mRNA strand that was translated into a single polypeptide.w9 This concept
of colinearity had to be abandoned with, for example, the discovery of
split genes in 1977 (box 2).w10-w12
Box 2: Split genes
Most eukaryotic genes are made up of exons and
introns, both of which are transcribed into a single RNA precursor. Only
the exons, however, are present in the mature form of the RNA, the mRNA,
and are subsequently translated into protein. RNA derived from introns is
removed from the RNA precursor during a process called splicing.
This and other insights led to a change in the
definition of a gene. At present, a (protein coding) gene may be described
as the entire DNA sequence necessary for the production of a functional
protein, including the transcription unit (exons and introns) as well as
the non-transcribed regulatory sequences (see below). A small number of
genes, however, do not code for proteins at all, but rather for various
types of non-messenger RNA (called non-coding RNA). These RNAs include
ribosomal, transfer, and other types of RNA that are not translated into
protein but have roles in the expression of protein coding genes.w2
OLIVER BURSTON/WELLCOME PHOTO LIBRARY
In terms of gene number, many living beings are
equivalent to humans. The human genome merely contains 20000-25000
protein coding genes, no more than the mouse or pufferfish.w6 w13 w14 This
was a disappointment; many people had believed that the human being's
superiority could be explained by having more genes.
Not all non-coding DNA is non-functional
For a long time, scientists believed that non-coding
DNA was useless junk. Coined by Susumu Ohno,w15w16 the phrase "junk
DNA" initially related only to DNA satellites but was soon used to
describe most categories of non-coding DNA.w17 w18
Junk DNA has since been shown to be much more
important, however. And possibly "the amount of non-coding DNA per
genome is a more valid measure of the complexity of an organism than the
number of protein coding genes."w19 Most scientists agree that the
phrase had been badly chosen, possibly repelling scientists from studying
non-coding DNA.w17
The availability of the human genome sequence has
obviously made the task of studying non-coding regions much easier. Recent
comparisons between the mouse and the human genomes indicate that about 5%
of DNA sequences are more conserved between the two species than
statistically expected from neutral evolution theories. This proportion is
surprisingly higher than the 1% of protein coding sequences, implying that
a remarkable proportion of non-coding DNA is conserved.w13
A total of 481 sequences (longer than 200 base pairs)
are 100% identical between homologous regions in the human, rat, and mouse
genomes (excluding sequences for ribosomal RNA). More than half of these
"ultraconserved elements" may be exclusively located in
non-coding regions of DNA.w20
These findings indicate that certain non-coding
regions undergo negative selection, an evolutionary process previously
attributed only to coding DNA. This means that mutations within these
regions do not spread through the population because they would have an
unfavourable effect on the fitness of the species, strongly suggesting that
the conserved non-coding sequences are biologically at least as functional
as the coding sequences.
Regulatory, structural, and evolutionary roles
As early as 1975, scientists hypothesised that the key
difference between humans and chimpanzees was in the regulation of gene
expression rather than in the coding DNA sequences. Their theory has been
confirmed.w21 w22
Many essential regulatory functions have been
attributed to non-coding DNA sequences, including promoters, enhancers, and
silencers. Whereas promoters are located just in front of transcription
units and bind the transcription machinery, enhancers and silencers may be
located in front of, behind, or even within transcription units. They bind
transcription factors, proteins that are capable of influencing the rate of
transcription by interacting with the transcription machinery. Many
regulatory sequences have already been identified and characterised; many
more, however, remain unknown.w23
Non-coding RNA transcripts are much more prevalent
than previously thought. Some have an anti-sense orientation compared with
well characterised coding transcripts, which potentially enables specific
interactions by hybridisation and the formation of RNA double strands.
These are known to trigger, at least in some organisms, a process called
RNA interference, resulting in the silencing of gene expression by
disruption of the coding RNA. Current research is trying to elucidate the
mechanisms by which RNA interference acts.w24 w25
Another novel mechanism of gene regulation has been
shown in yeasts. It is not the presence of a specific non-coding RNA that
represses the expression of a gene in yeasts, but the act of transcribing
the non-coding RNA in the vicinity of the same gene. The presence of the
transcription machinery may impede the binding of activators of
transcription.w26
Perhaps more surprisingly, even the non-coding repeats
within centromeres and telomeres seem to mediate gene silencing.w27 These
regions also play a structural role in the maintenance and partitioning of
chromosomes.
DAVID WOODFALL/STILL PICTURES
Finally, non-coding DNA (in the form of transposable
elements) may also be a source of genetic diversity, influencing the
evolution of (coding and non-coding) DNA sequences and thus of life in
general.w17 w28 w29
The future
Many hidden treasures have been found in "junk
DNA."w30 But we are still far from fully understanding, the
language of DNA.
An international consortium of scientists has launched
the ENCODE (ENCyclopedia Of DNA Elements) project, aimed at identifying all
functional elements in the human genome sequence, regardless of whether
they are coding or not.w31
Raghav chawla, fifth year medical student, University of Lausanne, Switzerland
Email: raghav.chawla@unil.ch
I thank Nicolas Mermod, Nicolas Fasel, and Richard Iggo, University of Lausanne
studentBMJ 2005;13:177-220 May ISSN 0966-6494
- Johnson PE. Bill Clinton was right! (about the human genome). Touchstone May 2001.
- Lander ES et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921.
- Venter JC et al. The sequence of the human genome. Science 2001;291:1304-51.
- Watson JD, Crick FHC. Molecular structure of nucleic acids. Nature 1953;171:737-8.
- Baltimore D. Our genome unveiled. Nature 2001;409:814-6.
- International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004;431:931-45.
- Makalowski W. The human genome structure and organization. Acta Biochim Pol 2001;48:587-98.
- Koszul R, Caburet S, Dujon B, Fischer G. Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments. EMBO J 2004;23:234-43.
- Gamow G. Possible relation between deoxyribonucleic acid and protein synthesis. Nature 1954;173:318.
- Portin P. The concept of the gene: short history and present status. Q Rev Biol 1993;68:173-223.
- Berget SM, Moore C, Sharp PA. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci USA 1977;74:3171-5.
- Chow LT, Gelinas RE, Broker TR, Roberts RJ. An amazing sequence arrangement at the 5’ ends of adenovirus 2 messenger RNA. Cell 1377;12:1-8.
- Waterston RH et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002;420:520-62.
- Aparicio S et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 2002;297:1301-10.
- Ohno S. Evolution by gene duplication (Springer-Verlag, New York, 1970).
- Ohno S. So much “junk” DNA in our genome. Brookhaven Symp Biol 1972;23:366-70.
- Makalowski W. Not junk after all. Science 2003;300:1246-7.
- Kuska B. Should scientists scrap the notion of junk DNA? J Natl Cancer Inst 1998;90:1032-3.
- Taft RJ, Mattick JS. Increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding DNA sequences. 1 Dec 2003. http://genomebiology.com/2003/5/1/P1 [accessed on 29 March 2005].
- Bejerano G et al. Ultraconserved elements in the human genome. Science 2004;304:1321-5.
- King MC, Wilson AC. Evolutions at two levels in humans and chimpanzees. Science 1975;188:107-16.
- Enard W et al. Intra- and interspecific variation in primate gene expression patterns. Science 2002;296:233-5.
- Pennisi E. Searching for the genome’s second code. Science 2004;306:632-5.
- Cawley S et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004;116:499-509.
- Mello CC, Conte D Jr. Revealing the world of RNA interference. Nature 2004;431:338-42.
- Martens JA, Laprade L, Winston F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 2004;429:571-4.
- Baur JA, Zou Y, Shaw JW. Wright WE. Telomere position effect in human cells. Science 2001;292:2075-7.
- Kazazian HH Jr. Mobile elements: drivers of genome evolution. Science 2004;303:1626-32.
- Balakirev ES, Ayala FJ. Pseudogenes: Are they „junk“ or functional DNA? Annu Rev Genet 2003;37:123-51.
- The news staff. Breakthrough of the year: the runners-up. Science 2004;306:2013-7.
- ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004;306:636-40.