HOME
Table of Contents (all articles on this disk)
This Article: HOW BIG IS A GENE?
For this article:
      Educational Goals and Objectives  Reference Abstracts  Test Questions  References


HOW BIG IS A GENE?

Theoretically, if we can analyze a segment of DNA to predict the ultimate protein, analysis of proteins should allow us to obtain information (such as size, shape, nucleotide sequence) about its source gene. Thus, working backwards we will better understand the function of a gene.

Today's scientific literature is filled with the latest advances in molecular biology. An electronic search for the word "gene" yields millions of citations. One would think that such a commonly used term would be well defined in the academic environment. It is not. The gene, that unit of inheritance located within (indeed it is part of) the chromosome, varies in size and function. A very large gene may make our eye color. A very small gene may make us grow to over two meters in height. One gene may produce several proteins. Sometimes many genes contribute to the production of a single protein compound.

Perhaps the simplest definition of a gene is ribbon of DNA, being a segment of a chromosome, containing a sequence of nucleotides that ultimately produce something we can measure or observe (a protein). This is a general definition that gives us little information about the physical size of any gene. However, this definition tells us quite a bit about every gene. The gene provides instructions (the genetic code) for several tasks. The DNA provides a template to replicate itself; DNA produces more DNA. The genomic DNA may also code for several types of RNAs used to carry on replication and protein synthesis. Ultimately, the genomic DNA provides the codes or instructions for the production of proteins. Also, through a series of protein reactions, the DNA specifies other compounds within our cells.

Theoretically, if we can analyze a segment of DNA to predict the ultimate protein, analysis of proteins should allow us to obtain information (such as size, shape, nucleotide sequence) about its source gene. Thus, working backwards we will better understand the function of a gene.

Take Hemoglobin, for example. Hemoglobin is a protein found in red blood cells. In fact, when it is bound to oxygen, it is the red pigment in your blood. This protein is essential to many animals (including all vertebrates) relying upon oxygen metabolism. A hemoglobin molecule carries oxygen through the blood stream to the sites of cellular metabolism.

Hemoglobin is actually eight separate molecules bound together. Alpha-globin and beta-globin make up four (two each) of the globular protein components of hemoglobin. The remaining four components are porphyrin rings with a central iron atom called 'hemes'. Hence the name hemoglobin; heme + globular = hemoglobin. Two alpha chains (each containing 141 amino acids) matched against two beta chains (each containing 146 amino acids) hold four separate hemes to form a single complex molecule of hemoglobin. The iron atoms have an affinity for oxygen (think of how easily iron rusts). The globular proteins that surround the heme groups prevent the oxygen from getting too close to the iron atoms. The oxygen literally 'sticks' to the iron but does not chemically bind to the iron. As the oxygen carrying hemoglobin arrives at a location where the oxygen is needed, the proteins change their shape just enough to 'pinch' the oxygen off of the heme and release it to the cells.

Hemoglobin has been extensively studied. It is interesting to note that several levels of structural organization exist within a single hemoglobin complex. As with all proteins, hemoglobin has a primary structure. The primary structure is the sequence of the amino acids used to build the protein molecule. The secondary structure refers to the twists and loops made by the molecule as its rests in its most stable state. The helical swirls of a DNA molecule or a coiled telephone cord are examples of secondary structures. The shape of the molecule in space, its loops and bends are know as the tertiary structure. And, owing to the several molecules that form a single hemoglobin molecule, we have a quaternary structure. The quaternary structure of hemoglobin is the spacial relationship of each component molecule to the whole hemoglobin complex.

Since we know that the amino acid sequence of any natural protein is generated from a sequence of nucleotides forming the gene for that protein, we should be able to gain some insight into the structure of the gene(s) used for the production of hemoglobin by examining the amino acid sequence.

AMINO ACID SEQUENCE FOR ALPHA CHAIN: length = 141, molecular-weight = 15126

   1 V L S P A D K T N V K A A W G K V G A H A G E Y G A E A L E
 31 R M F L S F P T T K T Y F P H F D L S H G S A Q V K G H G K
 61 K V A D A L T N A V A H V D D M P N A L S A L S D L H A H K
 91 L R V D P V N F K L L S H C L L V T L A A H L P A E F T P A
121 V H A S L D K F L A S V S T V L T S K Y R

AMINO ACID SEQUENCE FOR BETA CHAIN: length = 146, molecular-weight = 15867

   1 V H L T P E E K S A V T A L W G K V N V D E V G G E A L G R
 31 L L V V Y P W T Q R F F E S F G D L S T P D A V M G N P K V
 61 K A H G K K V L G A F S D G L A H L D N L K G T F A T L S E
 91 L H C D K L H V D P E N F R L L G N V L V C V L A H H F G K
121 E F T P P V Q A A Y Q K V V A G V A N A L A H K Y H

Beginning with these sequences of amino acids, we will be able to construct a 'probable' gene for the manufacture of hemoglobin. We cautiously use the word 'probable' because in fact many genes may be required to create even the simplest of molecules. However, as an exercise in logic and to aid our understanding of molecular biology, let us assume that each amino acid in any protein sequence requires only one genetic codon (three nucleotide bases) for its specification.

The following table lists the Standard Genetic Code for triplets (codons) for DNA from which a complimentary string of mRNA is made to specify the amino acids. The nucleotide codons are represented by three capital letters. The amino acids which these codons specify are abbreviated (along with their single letter designation) to the right of their respective codons. A short BASIC program is resented at the end of this article for those who wish automate the conversion process (see Table 1.). This program may be used as a subroutine for the generation of amino acid sequences.

(INSERT TABLE ONE)

We may use the amino acid sequences and the table above to derive a DNA nucleotide sequence that will code for the production of alpha-globin and another for the production of beta-globin. Since there are three nucleotides (a codon) for each amino acid in the final protein; for the alpha-chain we will need a DNA sequence of 141 x 3 = 423 nucleotides. For the beta-chain we will need 146 x 3 = 438 nucleotides.

The synthesis of the heme groups is not as straightforward as the production of amino acid chains but it is a reaction sequence and we can make some observations about its genetic basis. The heme group begins with a glutamate molecule or, by an alternate path, with a succinate molecule. The steps of the synthesis pathway are listed below just to show that there are at least 11 and sometimes 14 reactions required for the production of a single heme group.

The Synthesis of a Heme Group: Glutamate -> Glutamyl-tRNA -> Glutamate-1-semialdehyde -> 5-Aminolevulinate Succinate -> Succinyl-CoA -> 5-Aminolevulinate 5-Aminolevulinate -> Porphobilinogen -> Hydroxymethylbilane -> Uroporphyrinogen-III -> Coproporphyrinogen -> Protoporphyrinogen-IX -> Protoporphyrin-IX -> Heme

Each of these reactions requires a protein enzyme or other molecule designated by the DNA for that task. To simplify our model, let us assume that each reaction requires at least one codon (three nucleotides) for its accomplishment. For the production of a heme group we need 14 x 3 = 42 nucleotides.

Taken together, we know that a gene for the production of hemoglobin (an alpha-chain + a beta-chain + a heme group) must be at least 423+438+42 = 903 bases long. If the gene for hemoglobin must produce two alpha-chains, two beta-chains and all four heme groups at once, we will require a gene of 846+876+168 = 1890 bases.

If we draft a sequential list of the triplets (codons) required for the production of our hemoglobin molecule, we might have something that looks like this partial sequence developed for a specific variant of hemoglobin.


    1 aagctgggtg tgtagttatc tggaggccag atccccacta tattctttgt tcctcaccat
  61 gaaatatgga actggagaac tttcatgtct agctaaaggt ttgtaaatgc accaatcagc
121 aatctgtgtc taactcaagg tttgtaaagg caccaatcag caccctgtgt ctagctcaag
181 gtttgtaaat gcaccaatca gtgctctgtg tctagctaat ctagtgggga cttggagact
241 tttgtgtcta gctaaaggat tgtaaatgca ctaatcagca ctctgtgtag ctca

BASE COUNT 79 a 61 c 65 g 89 t



HOME
Table of Contents (all articles on this disk)
This Article: HOW BIG IS A GENE?
For this article:
      Educational Goals and Objectives  Reference Abstracts  Test Questions  References