skip navigation
student.bmj.com

Pub Medic:
Impress your mates at the pub
with your startling repertoire
of esoteric medical knowledge.


Junk DNA




Sequencing the human genome has revolutionised the way molecular biologists look at our DNA. And junk DNA, material once thought of as irrelevant genetic waste, is becoming a focus of scientific interest, as Raghav Chawla explains

Today, we are learning the language in which God created life," US President Bill Clinton said on 26 June 2000. The initial sequencing of the human genome had historically been completed.w1

Eight months on, two research teams published their draft versions of the sequence.w2 w3 In an article accompanying the publication in Nature, David Baltimore, remarked, "For conceptual impact, it does not hold a candle to Watson and Crick's 1953 paper describing the structure of DNA. Nonetheless, it is a seminal paper, launching the era of post-genomic science." w4 w5

Our genome

The human genome consists of about three billion base pairs, packaged into 23 pairs of chromosomes. Only 1% codes for protein - "coding DNA."w6 The remaining 99% is non-coding DNA; it is half repetitive and half non-repetitive sequences.w7

Most repetitive sequences are scattered randomly throughout the genome. They arise through segmental duplication or, more commonly, through transposition (box 1).w8 Some may carry pseudogenes, degenerated non-functional genes. Conversely, a limited number of repeats appear to be clustered in tandem. Many of these "DNA satellites" are in centromeric and telomeric chromosomal regions.w6

Box 1: Transpositionw6

Transposition, in which DNA sequences, referred to as transposons or "jumping" DNA, are excised or copied from one location of a chromosome and reinserted into another location of the same or a different chromosome. Depending on the type of transposition intermediate, these sequences are further subdivided into DNA or RNA transposable elements. The latter, also called retrotransposons, are common in the human genome and include long interspersed elements, short interspersed elements, and long terminal repeats.


Non-repetitive sequences are thought to have originated from transposable elements. Their derivation is said to have become unrecognisable through the occurrence of numerous mutations.w6

The concept and number of genes

Until the 1970s, a gene was thought to consist of single continuous chains of nucleotides that was transcribed into a single mRNA strand that was translated into a single polypeptide.w9 This concept of colinearity had to be abandoned with, for example, the discovery of split genes in 1977 (box 2).w10-w12

Box 2: Split genes

Most eukaryotic genes are made up of exons and introns, both of which are transcribed into a single RNA precursor. Only the exons, however, are present in the mature form of the RNA, the mRNA, and are subsequently translated into protein. RNA derived from introns is removed from the RNA precursor during a process called splicing.


This and other insights led to a change in the definition of a gene. At present, a (protein coding) gene may be described as the entire DNA sequence necessary for the production of a functional protein, including the transcription unit (exons and introns) as well as the non-transcribed regulatory sequences (see below). A small number of genes, however, do not code for proteins at all, but rather for various types of non-messenger RNA (called non-coding RNA). These RNAs include ribosomal, transfer, and other types of RNA that are not translated into protein but have roles in the expression of protein coding genes.w2


OLIVER BURSTON/WELLCOME PHOTO LIBRARY

In terms of gene number, many living beings are equivalent to humans. The human genome merely contains 20000-25000 protein coding genes, no more than the mouse or pufferfish.w6 w13 w14 This was a disappointment; many people had believed that the human being's superiority could be explained by having more genes.

Not all non-coding DNA is non-functional

For a long time, scientists believed that non-coding DNA was useless junk. Coined by Susumu Ohno,w15w16 the phrase "junk DNA" initially related only to DNA satellites but was soon used to describe most categories of non-coding DNA.w17 w18

Junk DNA has since been shown to be much more important, however. And possibly "the amount of non-coding DNA per genome is a more valid measure of the complexity of an organism than the number of protein coding genes."w19 Most scientists agree that the phrase had been badly chosen, possibly repelling scientists from studying non-coding DNA.w17

The availability of the human genome sequence has obviously made the task of studying non-coding regions much easier. Recent comparisons between the mouse and the human genomes indicate that about 5% of DNA sequences are more conserved between the two species than statistically expected from neutral evolution theories. This proportion is surprisingly higher than the 1% of protein coding sequences, implying that a remarkable proportion of non-coding DNA is conserved.w13

A total of 481 sequences (longer than 200 base pairs) are 100% identical between homologous regions in the human, rat, and mouse genomes (excluding sequences for ribosomal RNA). More than half of these "ultraconserved elements" may be exclusively located in non-coding regions of DNA.w20

These findings indicate that certain non-coding regions undergo negative selection, an evolutionary process previously attributed only to coding DNA. This means that mutations within these regions do not spread through the population because they would have an unfavourable effect on the fitness of the species, strongly suggesting that the conserved non-coding sequences are biologically at least as functional as the coding sequences.

Regulatory, structural, and evolutionary roles

As early as 1975, scientists hypothesised that the key difference between humans and chimpanzees was in the regulation of gene expression rather than in the coding DNA sequences. Their theory has been confirmed.w21 w22

Many essential regulatory functions have been attributed to non-coding DNA sequences, including promoters, enhancers, and silencers. Whereas promoters are located just in front of transcription units and bind the transcription machinery, enhancers and silencers may be located in front of, behind, or even within transcription units. They bind transcription factors, proteins that are capable of influencing the rate of transcription by interacting with the transcription machinery. Many regulatory sequences have already been identified and characterised; many more, however, remain unknown.w23

Non-coding RNA transcripts are much more prevalent than previously thought. Some have an anti-sense orientation compared with well characterised coding transcripts, which potentially enables specific interactions by hybridisation and the formation of RNA double strands. These are known to trigger, at least in some organisms, a process called RNA interference, resulting in the silencing of gene expression by disruption of the coding RNA. Current research is trying to elucidate the mechanisms by which RNA interference acts.w24 w25

Another novel mechanism of gene regulation has been shown in yeasts. It is not the presence of a specific non-coding RNA that represses the expression of a gene in yeasts, but the act of transcribing the non-coding RNA in the vicinity of the same gene. The presence of the transcription machinery may impede the binding of activators of transcription.w26

Perhaps more surprisingly, even the non-coding repeats within centromeres and telomeres seem to mediate gene silencing.w27 These regions also play a structural role in the maintenance and partitioning of chromosomes.


DAVID WOODFALL/STILL PICTURES

Finally, non-coding DNA (in the form of transposable elements) may also be a source of genetic diversity, influencing the evolution of (coding and non-coding) DNA sequences and thus of life in general.w17 w28 w29

The future

Many hidden treasures have been found in "junk DNA."w30 But we are still far from fully understanding, the language of DNA.

An international consortium of scientists has launched the ENCODE (ENCyclopedia Of DNA Elements) project, aimed at identifying all functional elements in the human genome sequence, regardless of whether they are coding or not.w31

Raghav chawla, fifth year medical student, University of Lausanne, Switzerland
Email: raghav.chawla@unil.ch

I thank Nicolas Mermod, Nicolas Fasel, and Richard Iggo, University of Lausanne

studentBMJ 2005;13:177-220 May ISSN 0966-6494

  1. Johnson PE. Bill Clinton was right! (about the human genome). Touchstone May 2001.
  2. Lander ES et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921.
  3. Venter JC et al. The sequence of the human genome. Science 2001;291:1304-51.
  4. Watson JD, Crick FHC. Molecular structure of nucleic acids. Nature 1953;171:737-8.
  5. Baltimore D. Our genome unveiled. Nature 2001;409:814-6.
  6. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004;431:931-45.
  7. Makalowski W. The human genome structure and organization. Acta Biochim Pol 2001;48:587-98.
  8. Koszul R, Caburet S, Dujon B, Fischer G. Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments. EMBO J 2004;23:234-43.
  9. Gamow G. Possible relation between deoxyribonucleic acid and protein synthesis. Nature 1954;173:318.
  10. Portin P. The concept of the gene: short history and present status. Q Rev Biol 1993;68:173-223.
  11. Berget SM, Moore C, Sharp PA. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci USA 1977;74:3171-5.
  12. Chow LT, Gelinas RE, Broker TR, Roberts RJ. An amazing sequence arrangement at the 5’ ends of adenovirus 2 messenger RNA. Cell 1377;12:1-8.
  13. Waterston RH et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002;420:520-62.
  14. Aparicio S et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 2002;297:1301-10.
  15. Ohno S. Evolution by gene duplication (Springer-Verlag, New York, 1970).
  16. Ohno S. So much “junk” DNA in our genome. Brookhaven Symp Biol 1972;23:366-70.
  17. Makalowski W. Not junk after all. Science 2003;300:1246-7.
  18. Kuska B. Should scientists scrap the notion of junk DNA? J Natl Cancer Inst 1998;90:1032-3.
  19. Taft RJ, Mattick JS. Increasing biological complexity is positively correlated with the relative genome-wide expansion of non-protein-coding DNA sequences. 1 Dec 2003. http://genomebiology.com/2003/5/1/P1 [accessed on 29 March 2005].
  20. Bejerano G et al. Ultraconserved elements in the human genome. Science 2004;304:1321-5.
  21. King MC, Wilson AC. Evolutions at two levels in humans and chimpanzees. Science 1975;188:107-16.
  22. Enard W et al. Intra- and interspecific variation in primate gene expression patterns. Science 2002;296:233-5.
  23. Pennisi E. Searching for the genome’s second code. Science 2004;306:632-5.
  24. Cawley S et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004;116:499-509.
  25. Mello CC, Conte D Jr. Revealing the world of RNA interference. Nature 2004;431:338-42.
  26. Martens JA, Laprade L, Winston F. Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature 2004;429:571-4.
  27. Baur JA, Zou Y, Shaw JW. Wright WE. Telomere position effect in human cells. Science 2001;292:2075-7.
  28. Kazazian HH Jr. Mobile elements: drivers of genome evolution. Science 2004;303:1626-32.
  29. Balakirev ES, Ayala FJ. Pseudogenes: Are they „junk“ or functional DNA? Annu Rev Genet 2003;37:123-51.
  30. The news staff. Breakthrough of the year: the runners-up. Science 2004;306:2013-7.
  31. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004;306:636-40.


Previous article    Return to top    Next article
Printer friendly page    Download article PDF    Email this article to a friend