HORT 250 - Biotechnology in Agriculture

Lecture 4 - Gene structure and gene expression

 

We have spent the first lectures talking about DNA as the hereditary material and how genes are encoded by the DNA. Genes carry information which is decoded by the organism to perform a specific function or task. DNA is not capable of catalyzing any reaction by itself. Again the analogy with a computer is appropriate: without the hardware of the computer and its operating system, the application software is useless. The hardware for genes is the cellular machinery that converts the software (instructions in the DNA) into molecules that perform specific functions. For example, in Griffith's experiments the gene responsible for transformation of avirulent bacteria into virulent, disease-causing bacteria in some way carried information to convert this bacterium into a pathogen. (What might be encoded by that gene?)

Genes carry information for a number of different functions. The function that is most typically thought of is that of encoding proteins. These proteins can be divided into a number of different groups:

Other genes encode specific RNA molecules which are not translated into proteins but perform other functions within the cell. Examples include the various RNAs that are major components of ribosomes (known as ribosomal RNAs or rRNAs), and the transfer RNAs (tRNAs) which are responsible for incorporating the correct amino acids during protein synthesis.

Some of the DNA in the genome is used to make sure that gene expression is correctly regulated. And, especially in many eukaryotes, there is lots of DNA whose function we know nothing about. What does each cell of a lily plant do with 100 metres of DNA?!

In summary, DNA encodes proteins, RNA molecules and regulatory information.

Now I want to talk about how the information in this sequence of bases in DNA is converted into function in a living organism. DNA on its own is not capable of catalyzing any reaction, building any structure, etc. The information must be decoded within the cell and in some way translated into molecules that perform specific functions. This process of decoding genetic information and converting it into molecules that perform the functions of the organism is called gene expression. Our next challenge is to understand how the information in DNA is converted into molecules of action, such as proteins.

The central dogma (principle) of molecular biology is as follows:

DNA serves as the template to make RNA. This process is known as transcription where information in the form of a sequence of nucleotides is transferred from a double stranded DNA molecule to a single stranded RNA molecule. Why is this process called transcription? The information content of the DNA molecule is conserved as a sequence of bases in the RNA molecle. Remember that a transcriptionist copies words from charts or tapes into words in a written document; the language (sequence of bases) does not change but the medium does (RNA instead of DNA).

RNA then serves as the source of information to make proteins in a process called translation. Here the nucleotide sequence information in RNA is converted to the amino acid sequence of proteins. The language of RNA (the 4 bases A, C, G and U in place of T) is translated to the language of proteins (the 20 amino acids). Note that these two terms, transcription and translation, sound confusingly similar. However, if you think about what these two words mean in other contexts (a transcriptionist copying text from one medium to another, and a translator interpreting the meaning of words in two different languages) I think you will find it a little easier to remember the meanings of these two terms.

Transcription. This is performed by a complex of proteins called RNA polymerase which synthesizes a polymer of RNA. The RNA polymerase uses one strand of the DNA molecule as template to synthesize a molecule of RNA. In RNA, a base known as uridine (U) is used in place of thymine (T). The sequence of bases in the RNA molecule depends on the sequence in the DNA.

See diagram below.

The bottom strand of the DNA molecule serves as template for making RNA. RNA polymerase synthesizes the RNA and the sequence of bases in the RNA are determined by the sequence of bases in the DNA.

The RNA that is produced is complementary to the lower strand of the DNA molecule; apart from the substitution of U for T, the RNA has the same sequence as the upper strand of the DNA molecule.

Is the entire genome transcribed from DNA into RNA? No, only specific parts of the genome are transcribed to produce RNA molecules, and only some of the genome serves as template to make RNA. The regions that are transcribed are defined by signals (specific DNA sequences) informing RNA polymerase to start and stop RNA synthesis.

RNA serves a variety of functions within the cell. Most of the time we think of RNA as the template for protein synthesis, as described below. But remember that most of the RNA in a cell is used as structural components of ribosomes, as transfer RNAs for protein synthesis, in RNA processing, and in other processes.

The RNA that carries information to direct the synthesis of proteins is known as messenger RNA, or mRNA. Note also that while the sequence of bases is preserved from DNA to RNA, the RNA molecule lacks the long term stability that is an important feature of DNA. RNA is normally quite unstable and is quickly broken down. One advantage of this instability is that it allows an organism to alter the genes that are being expressed by changing the genes that are transcribed from DNA into RNA.

Here are some other sites where you can get good information about transcription:

The Beginner's Guide to Molecular Biology, especially the chapter on RNA transcription but other chapters are very informative as well; and a section from the DNA Learning Center. And here is a simple animation of the transcription process.

Translation. mRNA serves as the template for directing synthesis of proteins, a process called translation which is performed by ribosomes. Ribosomes attach to the start of the mRNA (at the 5' end) and use the RNA as template to produce a specific protein. See the schematic diagram below.

The ribosome identifies the first sequence of AUG at the start of the mRNA. This codon of 3 bases is recognized by a specific transfer RNA (tRNA) and puts a methionine amino acid residue at the start of the protein. The ribosome then reads the RNA in a series of codons of 3 bases. Each codon directs the addition of a specific amino acid to the protein chain. The ribosome stops translating the RNA when it reaches specific codons that tell it to stop adding amino acids to the protein. There is a good animation of the translation process at the Cold Spring Harbor DNA Learning Center. Another presentation of the processes of RNA and protein synthesis can be viewed at the Genentech web site. And, just for good measure, here is another description of translation from the United Kingdom and another simple animation of translation.

There are a number of important features of the genetic code outlined below:

For the genetic code to be translated into the correct sequence of amino acids in a protein, translation must start at a defined position in the RNA and the code must be read in codons of 3 bases. The example below will attempt to demonstrate the importance of these rules for translating the genetic code into functional proteins. The sequence of letters below includes a sentence in English, but this cannot be decoded easily:

BLABLATHEBOYATETHEBUGSTI

This looks like a load of unintelligible jibberish in this format. However, if we apply a couple of simple rules it is possible to make sense of this. Here are the 2 rules that we need here:

  1. We only start reading this sentence at the first THE word, equivalent to the first AUG in a mRNA.
  2. We read the sentence only in groups of 3 letters with no punctuation between words.

    When we apply these rules, this is what we get:

BLABLA THE BOY ATE THE BUG STI

If we ignore the first rule and start at a different position, it still reads as jibberish:

BL ABL ATH EBO YAT ETH EBU GST I

Similarly, if we didn't read it in groups of 3 letters it wouldn't be possible to make sense of this. I hope this example helps illustrate the importance of the rules that are required for the correct operation of the genetic code.

The genetic code was decoded quite a long time ago using a variety of ingenious techniques. The complete genetic code can be viewed at the Genentech web site.

In the same way that DNA in the genome has signals about what information to transcribe into RNA, mRNA contains signals important for translation into protein.

The first AUG triplet defines the start of translation and encodes methionine, the first amino acid in all proteins. The stop codon defines the end of translation; this is also not a random process and therefore proteins have defined starts and stops. The sequence between the AUG and STOP codons is known as an open reading frame, or ORF. This term implies that there is information to encode a protein. The RNA could be translated in any of 3 frames, depending on which nucleotide you start at. The first AUG always defines which of the 3 frames is used. It is called an open reading frame because it contains no stop codons that are in frame until the end of the protein is specified.

We can now look at a gene as a sequence of nucleotides in DNA and identify some of the essential features that a gene must possess for function:

All of these critical features are encoded in the gene diagrammed below.

 

Hort 250 Main Page

Lecture Schedule