Saturday, February 21, 2009

The Genetic Code - how to read the DNA record

(NOTE: The end of this article has been revised and expanded from the original.)

DNA is the kind of molecule that stores genetic information in every living cell. It describes how our bodies are made, and to a degree, how they operate. The translation of DNA, a sequence of nucleotides, to a sequence of amino acids (protein units) is a complex but fascinating process. Here's a simplified account of the essentials:

A selected portion of the DNA is copied in complementary form, making a messenger RNA (mRNA) chain molecule. There are four kinds of nucleotide in the DNA, abbreviated G, T, A, and C; and four kinds in the RNA, called C, A, U, and G. When copying from DNA to RNA, the correspondence is:

G -> C
T -> A
A -> U
C -> G

So, for example the DNA sequence


when copied to RNA, makes the RNA sequence


A sequence of three nucleotides, such as GCC, is called a codon. Each codon sequence encodes for one of 20 amino acids, or else is a stop codon. The genetic code is a scheme that translates the 64 (4 x 4 x 4) types of codon to the 20 amino acids and the stop signal. The codon for the amino acid Methionine also functions as a start signal. There are three codons that mean 'stop', and there are one to six codons representing each amino acid. Here's the complete genetic code:

[START], Methionine <-- AUG
Alanine <-------- GCU, GCC, GCA, GCG
Leucine <-------- UUA, UUG, CUU, CUC, CUA, CUG
Arginine <------- CGU, CGC, CGA, CGG, AGA, AGG
Lysine <--------- AAA, AAG
Asparagine <----- AAU, AAC
Aspartic acid <-- GAU, GAC
Phenylalanine <-- UUU, UUC
Cysteine <------- UGU, UGC
Proline <-------- CCU, CCC, CCA, CCG
Glutamine <------ CAA, CAG
Serine <--------- UCU, UCC, UCA, UCG, AGU, AGC
Glutamic acid <-- GAA, GAG
Threonine <------ ACU, ACC, ACA, ACG
Glycine <-------- GGU, GGC, GGA, GGG
Tryptophan <----- UGG
Histidine <------ CAU, CAC
Tyrosine <------- UAU, UAC
Isoleucine <----- AUU, AUC, AUA
Valine <--------- GUU, GUC, GUA, GUG
[STOP] <--------- UAG, UGA, UAA

The key elements of translation are small transfer RNA (tRNA) molecules. Each kind of tRNA molecule has a region called the anticodon that can recognize and attach to a particular codon of a messenger RNA (mRNA) molecule. The tRNA molecule has another region called the "3' terminal" that attaches to a particular amino acid. This attachment is aided by molecules called aminoacyl-tRNA synthetases, of which there is generally one kind for each kind of amino acid. There are even helper molecules that provide a proofreading function to detect and correct any translation errors.

Each kind of tRNA molecule associates one kind (sometimes a few kinds) of codon with a particular amino acid, so there are one or more kinds of tRNA for each row of the above genetic code table. For example, there is a kind of tRNA with a region that attaches to Tryptophan (with the help of a specific kind of aminoacyl-tRNA synthetase), and with another region that recognizes and attaches to any part of mRNA with a UGC codon.

So if the RNA sequence is


we can divide it into codons as


Five tRNA molecules will attach to the first five codons, and five amino acids will attach to the tRNA molecules, something like this (with abbreviated names for the amino acids):

No tRNA molecule will attach to the last codon, because it is a stop codon, and the translation will stop.

The amino acids connect into a chain in this sequence, like this, which detach from the tRNA molecules:


Each tRNA molecule detaches from the mRNA and from the chain of amino acids, to be 'loaded' with another amino acid and used again. The detached chain of amino acids, a protein, folds into a three-dimensional shape to function as a protein. (This folding is another complex process, often needing the aid of specialized helper molecules.)

These are the basics of the translation, but it is actually more complex than this, because other molecular machinery is needed to make everything happen in the right sequence. The 'work bench' of the mRNA reading machinery is a collection of tiny particles called ribosomes that look like tiny dots in the center of a living cell (but huge compared to the tRNA molecules). There are also other tools such as initiation factors, releasing factors, and various enzymes that control the process.

Each ribosome has a small and large unit that link together on either side of the mRNA ribbon, forming a bead that can slide along the mRNA, reading it. Many ribosomes typically read one mRNA strand at one time, producing proteins. Each ribosome has three sites on one side of the hole through the 'bead' that hold tRNA molecules in position to attach to, and detach from, the mRNA as it passes through the hole. The ribosome 'workbench' has other sites to hold the various other 'tools' in position to operate on the various stages of the process.

Where does the genetic code come from? It is not the result of chemistry or any laws of physics. It is determined by the set of tRNA molecule types, and aminoacyl-tRNA synthetase types, which are constructed according to DNA information, which encodes not only the building materials and the building plans, but also the building tools and the building methods. In other words, the genetic code is just information that has always been there since life began.

The number of possible genetic codes is a huge number, 85 digits long:

1,510,109,515,792,918,244,116,781,339,315,785,081,841,294, 607,960,614,956,302,330,123,544,242,628,820,336,640,000

and all of these many codes would work equally well. But all of life uses just one genetic code, about 280 bits of information, the one that scientists Watson and Crick discovered in 1953, but was there since creation. The theory of evolution has no explanation for how the genetic code began, because it can't explain how information can arise from no information. Nor can it explain why there is only one genetic code (out of such a huge number of equally workable codes), even though there is extreme variation of everything else. The mechanism of the present genetic code is very complex; and evolutionary theory supposes that it randomly evolved from a simpler, smaller code. But because there are so many equally viable genetic codes, random evolution should have produced species with many different codes. The evolutionary explanation is far more unlikely than dumping a bucketful of dice on the floor and expecting them to all land with the same number up.

The creationist explanation is that the universal genetic code is like a signature of the creator, who chose a uniform code for all of the designs of life. A short story will illustrate the principle:

During the Cold War, Russia was suspected of stealing American technology. Proof came when some Russian war equipment given to a third country was captured and examined. It contained an integrated circuit that was identical to an American design. It is theoretically possible that the Russians had the same design concept, leading to a similar design. But digital circuits have thousands of component parts connected by thousands of wires. There trillions of ways to position the parts on the chip and trillions of ways to route the connecting wires that work equally well. It would be impossible for the Russians to independantly produce the same positions and routings even if the logical design were identical. But examination showed the details were identical, even details left over from correcting wiring errors. In effect, there was an American 'signature' in the copied design.