In the last lecture I described the evidence that led to the identification of DNA as the genetic material, and the principles of DNA structure. The mechanisms by which this information is translated into form and function, from DNA to RNA to protein, will be the subject of future lectures. Today I will discuss how DNA is organized and packaged in different organisms, and how individual genes can be isolated.
The DNA content of an organism is referred to as its genome. You may have heard of the Human Genome Project. This is a large international research project whose goal is to understand the precise structure of the human genome, in terms of the entire sequence of bases (A, C, G and T) in the human genome. The goal is to finish this task by 2005, and it is likely this will be realised. The genomes of several bacteria have been entirely sequenced now, as has the genome of yeast, the first from a eukaryote. Similar genome projects are now underway with other organisms, including plants such as Arabidopsis (due to be completed in 2000), rice (the first major crop to be targeted and being worked on mainly in Japan) and maize (just getting underway).
How are genomes organised and packaged?
In eukaryotes (plants, animals, fungi) the genome is packaged into chromosomes and these chromosomes are localised in the nucleus. A nucleus is one of the defining features of eukaryotes. Chromosomes are made up of both DNA and protein. The proteins in chromosomes perform structural functions and may be important for gene regulation. However, the genetic information is carried in the DNA. In any one species the chromosome number is constant, but the number of chromosomes varies from species to species.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Some species only have 2 chromosomes. Remember this is the haploid number. Most cells are diploid and have two copies of each chromosome. Species that are closely related tend to have the same number of chromosomes, e.g. human has 46, chimpanzee 48; corn and sorghum both have 20.
See the electron microscope picture of human chromosomes. If you want to learn a lot more about how DNA is packaged in eukaryote chromosomes, go to the DNA Learning Center.
Bacteria (prokaryotes) organize their genomes quite differently than eukaryotes, although it is still called a chromosome.
See the picture of DNA released from the bacterium E. coli
The length of the DNA molecule that makes up the genome of E. coli is approximately 1,000 times longer than the length of an E. coli cell. Therefore the DNA must be tightly packaged within the cell. This requirement for some method to package DNA into a compact conformation also applies to eukaryote chromosomes.
The chromosomes of eukaryotes are located in the nucleus of each cell, but all eukaryotes possess at least one other genome, in addition to the one in the nucleus, that is required for normal function. This is found in mitochondria. Plants also possess a third genome in their chloroplasts.
Both of these additional, extra-nuclear genomes, in mitochondria and chloroplasts, share many similarities with prokaryote genomes. These organelles and their genomes are derived from bacteria that were at one time symbionts with eukaryote cells, that is both the bacteria and the eukaryote benefitted from this "lifestyle". During the course of evolution the genomes in organelles have changed, but they are clearly related to those of bacteria. Many genes that are required for mitochondrial and chloroplast function are now found in the nuclear genome.
Now I want to consider the sizes of different genomes.
You can report the size of genomes in a variety of different ways:
|
molecular weight |
Daltons |
|
mass |
picograms, 10-12 gms |
|
length |
metres |
|
base pairs |
bp, or frequently kilobases (kb or 1000 bp) |
|
genes |
(but some genomes contain lots of DNA that doesn't encode genes) |
In the same way that computer memory is reported in bytes, the functional unit of computer memory, genomes are normally described in base pairs because this tells you something about the potential capacity of that genome and its ability to encode genes.
See the table below on genome size
|
|
Organism |
Genome Size (kb) |
No. of Genes |
|
Viruses (bacterial) |
MS2 |
4 |
|
|
|
lambda |
50 |
~30 |
|
Viruses (mammalian) |
SV40 |
5 |
~ 8 |
|
|
smallpox |
267 |
~ 200 |
|
Prokaryotes |
Mycoplasma genitalium |
580 |
470 |
|
|
Escherichia coli |
4,700 |
4,000 |
|
Eukaryotes |
S. cerevisiae (yeast) |
12,068 |
5,885 |
|
|
Arabidopsis |
100,000 |
20 - 30,000 |
|
|
Human |
3,000,000 |
~ 100,000 |
|
|
Maize |
4,500,000 |
~ 30,000 |
|
|
Lily |
30,000,000 |
|
Let's try to put these numbers on genome size into a format that is more comprehensible. If we put 25,000 bases/letters on a page, then the genome of lambda virus, which infects E. coli, could be written on two very dense typewritten pages. The E. coli genome would fit in a slim 200 page book. By contrast, the human genome would require a large encyclopedia consisting of 80 volumes, 1,500 pages per volume.
One point to remember about all of these genomes is that they must contain information that directs their replication, or they will be lost during cell division. They all contain at least one origin of replication which allows these genomes to be replicated before cell division occurs.
These number of genes that are carried in these genomes varies from less than ten in some viruses to an estimate of about 100,000 in the human genome. For those genomes that have been completely sequenced the number of genes is known with some accuracy. From the table above you can see that the amount of DNA per gene tends to be much higher in the eukaryotes. In fact, the genomes of higher eukaryotes contain more DNA that does not code for genes than does. The function of this "non-gene" DNA is not clear. It may perform important functions in gene regulation or it may be largely "junk" which has accumulated through the course of evolution.
How can you study the function of these genes? A genetic approach is to look for mutations in a specific gene and study the effects of these mutations on the function of the organism.
But in order to understand the structure of the gene (the sequence of bases, A,C,G,T), to manipulate and modify that gene, and to be able to transfer that gene into another organism, you need to be able to isolate genes, specific segments of DNA, from all the other DNA in the genome. This is gene cloning. If you look up 'clone' in an older dictionary (pre 1975) you will find the following definitions:
clone (noun): the stock of individuals derived asexually from one individual.
clone (verb): to propagate asexually from an individual
This is an older definition, referring to the propagation of identical copies of one individual (typically a plant variety). The word is widely used in horticulture where many selected varieties cannot be propagated by seed because they do not breed true, i.e. plants grown from seed will be different from the maternal plant. For example, all apple trees of the Golden Delicious variety are clones propagated from one initial selection. Another example of cloning is the sheep Dolly; cloned from mammary cells of another sheep, the first such example of cloning in mammals. Other biological materials can also be propagated by cloning, e.g. cell lines with specific properties.
However, in the era of molecular biology and genetic engineering, cloning also refers to the isolation and propagation of a specific piece of DNA. For much of this class, I will refer to the isolation of individual genes as "cloning".
- gene cloning: to incorporate a DNA sequence (a piece of DNA) into a vector which can replicate in another organism.
How difficult is it to clone a particular gene? Let's consider cloning one gene from the human genome. There are approximately 100,000 genes in the human genome. So at its simplest, if we have a collection of 100,000 cloned pieces of human DNA, we would have a 1 in a 100,000 chance of picking the clone with the right piece of DNA. Not very good odds. In reality the odds are even worse than this because the genome has the capacity to encode more than a million genes. So to clone a specific human gene is akin to looking for the proverbial needle in a haystack. While this was once a complicated task, perhaps involveing several scientist-years of effort, or the graduate careers of many students, times have changed. A variety of new technologies, such as the polymerase chain reaction (PCR), and the development of large scale DNA sequencing programs, have now made gene cloning quite trivial. A more difficult task nowadays is determining the function of all these genes that have been cloned, so called "functional genomics".
The genome is analogous to a library where the individual books are equivalent to genes in the genome. In the nucleus, all of these books are mixed together, like having all the books piled up in a jumble on a table. It is virtually impossible to find the book you want among the mess on the table. However, when all the books are suitably filed on the shelves, it is possible not only to find the specific book that interests you, but to take the book from the shelf and read its contents.
The first step in cloning genes is to take the complex mixture of genes found in the genome of an organism and fragment this genome into a collection of smaller, more manageable pieces. Pieces of DNA (genes or parts of genes) are normally cloned and propagated in bacteria, usually E. coli. The following describes the basic principles behind gene cloning. This will introduce a number of new terms which will be used throughout the class. You should familiarise yourself with these.
The most widely used method for gene cloning in E. coli (and the simplest to describe) uses plasmids as the cloning vector.
Plasmids are small (2,000 to 200,000 base pairs) circular DNA molecules that are found in many bacteria

Plasmids can carry a number of genes and these genes can give a selective advantage to the bacterium. For example, plasmid genes can provide resistance to antibiotics, resistance to heavy metals, ability to metabolise specific compounds, and many other traits. When a bacterium carries one of these plasmids it can grow under conditions where other bacteria are unable to grow. Because of the small size of plasmids, they can be manipulated and modified with relative ease.
Vectors are specialised plasmids (and in some cases viruses) which are used to clone pieces of DNA The plasmid vectors that are typically used for most standard gene cloning experiments are about 3,000 base pairs in size and can be used for cloning pieces of DNA up to about 10 kb. Other vectors and specialized procedures can be used to clone very large pieces of DNA, up to 250 kb.. These vectors can then be transferred into a bacterium or another organism, and then multiplied or replicated.
In order to insert pieces of DNA into a cloning vector, we must first be able to cut DNA molecules into smaller pieces and join them together. DNA is usually cut with restriction enzymes and DNA molecules are joined together with another enzyme, DNA ligase.
Restriction enzymes cut DNA whenever a specific DNA sequence is present (see overhead). For example, the enzyme called HaeII cuts at GGCC. EcoRI cuts at GAATTC. Different restriction enzymes cut at different DNA sequences. Some restriction enzymes cut straight through both strands of the DNA molecule and produce "blunt" ends, e.g. HaeII, while others produce overhanging, "sticky" ends. These sticky ends are useful in joining together different DNA molecules.
These restriction enzymes are found in bacteria where they are part of a system that bacteria use to protect themselves against viruses. They are called "restriction enzymes" because they restrict the growth of invading viruses by cutting up the DNA of the virus. The names of these enzymes are derived from the bacteria in which they were discovered, so that EcoRI was found in E.coli, and TaqI was identified in Thermus aquaticus, a species of bacterium that is found in hot springs.
DNA ligase is an enzyme that can join (ligate) DNA molecules together. Ligase and ligation have the same etymology as "ligament", which joins bone to bone. But DNA ligase joins DNA to DNA.

The general scheme for cloning DNA from any source (plant, animal, wherever) uses these fundamental tools. In some ways the basics of gene cloning are similar ro using the "cut" and "paste" functions in a word processing program. The scheme is outlined below.

In the same way that you can go to your reference library and find a specific book, DNA libraries can be used to find a specific gene and then study it. The ability to isolate specific genes is essential if these genes are to be transferred from one organism to another. This is one of the fundamental techniques in biotechnology.
As this area of science expands, the analogy with a reference library is becoming more appropriate. More and more genes are being catalogued (cloned, DNA sequence determined, and filed) from a variety of different sources. In 1995 the first complete DNA sequence of a free living organism was published. By now, many more bacterial genomes have been sequenced. The first eukaryote genome, that of brewers yeast, has also been completed. As I mentioned at the start of the lecture, it is likely that the entire sequence of the human genome will be completed by 2005. Now you can sit at your computer and scan through a collection of genes that have been cloned from several organisms, identify those that are of interest to you, and order them from an agency that acts as the central clearing house for genes.