The Human Genome Project (HGP) is an international research effort to decipher the entire human genome and understand the unique hereditary instructions that each person possesses. The HGP is a jointly funded project by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH) with additional research done by the National (NHGRI). The project, launched in 1990, was originally planned to last 15 years, but rapid advances in technology have accelerated the expected completion date to 2003. The Joint Genome Institute (JGI), established in 1997, is one of the largest publicly funded human genome sequencing centres in the world and contributes greatly to the HGP. The Wellcome Trust in the United Kingdom also contributes substantial investments to the HGP.
The roots of the HGP can be traced back to the atomic bomb era that began during World War II. Descendants of bomb survivors harbour DNA mutations as a result of severe radiation exposure. These mutations were passed to their descendants who developed horrible diseases and malformations. The DOE’s role in the HGP arose to study the genetic and health effects or potential health risks of radiation and chemical by-products of energy production. Scientists realize the best way to study and treat these effects were to directly study DNA, so blood samples were collected and stored for future DNA analysis.
Prices start at $12
Prices start at $11
Prices start at $10
At a joint DOE and International Commission on Protection against Environmental Mutagens and Carcinogens (ICPAEMC) in 1984, the question was asked, “Can we, and should we, sequence the human genome?” After lengthy debates, scientists eventually decided to do it. In 1986, the DOE announced its Human Genome Initiative to decipher the human genetic script.
The HGP will transform both biology and medicine. The goals of the HGP are to identify the approximate 80 to 100,000 genes in human DNA and determine the sequences of the approximately 3.2-billion chemical bases that make up human DNA. This information will be stored in computer databases while data analysis tools will be developed to apply this information to human biology and medicine. Results of the HGP are hoped to transform the treatment of symptoms in molecular medicine to address the deepest causes of disease at their molecular foundations in their earliest stages. Gene therapy will soon be “fixing” genetic errors before they result in disease. Finally, the HGP will address the ethical, legal, and social issues (ELSI) that may arise from the project. The United States HGP is the first large scientific undertaking to address these types of issues.
The original goals for the $3 billion project were set in 3 five-year plans. The goals are centred around the following aspects: Mapping and Sequencing the Human Genome, Mapping and Sequencing the DNA of Model Organisms, Data Collection and Distribution, Ethical, Legal, and Social Considerations, Research Training, Technology Development, and Technology Transfer. Because of the rapid progress, the plans were updated in 1993 and again in 1998. The first complete working draft was completed in June 2000, and the final sequence is expected to be available by 2003.
DNA Structure and Biology
Genetic information is the complete set of inherited instructions, or an organism’s genome, for building proteins and other molecules. Inside the nucleus of each mammalian cell are pairs of chromosomes that are composed of deoxyribonucleic acid (DNA). This long, helical molecule carries all the information to direct the production of proteins.
Biologists were convinced that chromosomes carry genes and genetic information, but they wanted to find out whether it is DNA or proteins within the chromosomes. Since proteins were known to be so important and are polymers made of 20 different amino acid monomers, they were the prime suspects for carrying genetic information. Structural genes contain instructions for making proteins, which build and operate cells and the body as a whole. DNA polymers contain only 4 kinds of nucleotide monomers. Numerous diseases were found to be a consequence of defective proteins, which were in turn made under the direction of genes.
In 1952, Alfred Hershey and Martha Chase formulated a hypothesis in which bacteriophages attach to the cell wall and inject its DNA into the bacterium, therefore transmitting the genetic information. This process would produce new phages, proving the DNA is the genetic material and “its protein coat is merely a combination package crate and a micro-syringe.” Hershey and Chase added radioactive 35S to a liquid nutrient media and radioactive 32P to another, spun the cultures in a Waring blender to shear away attached phages, and centrifuged the bacteria from the liquid. In the end, Hershey and Chase found only 32P in the pellet of the bacteria, providing conclusive evidence that the phage genetic material consisted of DNA, not protein (Figure 2).
In 1952, Rosalind Franklin photographed crystals of highly purified DNA showing a twisted helix. The bases, perpendicular to the length of fibre, have sugar-phosphate molecules on the outside of the helix. Further analysis showed one turn of the helix contains ten nucleotides and the diameter of the DNA suggests it is composed of more than one strand.
With this information, James Watson and Francis Crick discovered the true structure of DNA in 1953 and created a 3-D model showing the double-helix (Figure3). In the model, nitrogen-containing bases (nucleotides) pair up adenine pairs with thymine (A-T) and guanine pairs with cytosine (G-T). (Figure 4) Watson and Crick noticed that for hydrogen bonds to form properly between the base pairs in DNA, the two nucleotide strands of the DNA molecule had to run anti-parallel. Two strands allow the DNA and chromosomes to reproduce to form new cells that contain two pairs of chromosomes.
DNA molecules are packed into 23 pairs of chromosomes in the nucleus of a cell. DNA contains nucleotides, five-carbon sugars (deoxyribose), and phosphate groups bonded between the sugars (Figure 1). The bases connect to the sugar-phosphate backbone by hydrogen bonding. Nucleotides are linked in a strand of DNA with a phosphate group attached to a 5’ carbon of sugar that is joined to a 3’ carbon of sugar of an adjacent nucleotide, forming a string of alternating phosphate and sugar groups (Figure 3). Each bond forms a base pair, which is generally used as the measurement for determining the genome size.
The human genome consists of 3.2 billion base pairs that are repeated many times throughout the genome. The sequence specifies the exact genetic instructions that make up the “blue-print” for life and distinguish one species from another. All organisms have similarities in their DNA sequences, so insights gained by non-human genomes often lead to new knowledge about human biology.
DNA replicates (duplicates) just before cell division. When two strands of DNA separate, each strand can be used as a template, or a “mold,” to produce a complementary strand. DNA is described as a “semiconservative” because its molecule has one old strand and one new strand.
Before replication, proteins pry the double helix open at the replication forks, where unwinding and replication take place. DNA helicase enzymes then bind and move along the double helix in opposite directions from the origin, separating the two strands. The two DNA strands are separated along with the hydrogen bonds that link the paired bases, much like opening a zipper. The enzyme, DNA polymerase, grasps a template strand exposed by helicase and moves along the strand to make a new DNA strand. Nucleotides, added one by one to the 3’ end, are paired by hydrogen bonding to an exposed base on the DNA template. Nucleotides are only added from the 5’ end to the 3’ end. In the opposite direction, nucleotides are added in small Okazaki fragments, named after Reiji Okazaki who discovered this process. After DNA strands open, DNA polymerase adds bases to the strand and DNA ligase joins the ends. DNA polymerase carefully proofreads, allowing for less than 1 mistake per billion nucleotides.
Damaged or mutated DNA are heritable changes that may result from uncorrected errors in replication, failure to repair damage correctly, or spontaneous rearrangements. A mutation is a permanent change in the bases that make up a gene. Since the bases constitute the code letters, which direct what amino acids are incorporated into a protein, changes in the bases of DNA often result in substantial changes in protein function.
Many of the bases in DNA do not code for any protein and are not part of genetic information. Most of the DNA in the human genome does not seem to have any function and is referred to as “junk DNA.” In fact, it is often said looking for genes is like trying to find a needle in a haystack. The first step is to determine the human genome sequence and find genes. “Knowing the bases that make up a gene and where it’s located on a chromosome doesn’t tell you what the gene does,” says Linda Ashworth, a Lawrence Livermore biomedical scientist working in the Laboratory’s Human Genome Center.
She continues, “after sequencing, we still need to determine what proteins the genes produce, and what those proteins do in the cell.” Then, scientists need to know the structure and function of the protein produced by the gene and how that protein interacts in its environment. Ashworth describes the sequence as the “detailed map we need to help us find the buried treasure.”
While mapping chromosomes, scientists use markers to locate different genes. To figure out what chromosome the gene is on, probes highlight the chromosome and the general location of the gene. Then, scientists cut out pieces of the chromosome with special enzymes. With these pieces, scientists clone hundreds of copies of these fragments and sequence the DNA to find recognizable DNA markers and look for places where the fragments share some of the same markers. When researchers wish to sequence a fragment, they run four nearly identical reactions using that DNA as a template, in which the four bases are labelled with different fluorescent dyes.
The DOE uses two major sequencing machines, the Perkin-Elmer 3700 and the MegaBace DNA sequencers. The Perkin-Elmer 377 DNA sequencer is also used for much of the sequencing. The use of fluorescent dyes increases both the safety of and ability to sequence DNA quickly and efficiently. For amplification of DNA, the most commonly used sequencer is the Bacterial Artificial Chromosome, or BAC.
A single “run” of a MegaBace DNA sequencer involves 96 channels or lanes of DNA sequencing, which is nearly 500 individual sub-units or bases of DNA sequence. In March 2000, the Production Sequencing Facility (PSF) ran 1,800,000 lanes of DNA and 84% of the information was found useful, showing a great increase inefficiency.
Shotgun sequencing, a method used to cut DNA into hundreds or thousands of random bits with enzymes, is used in today’s automated sequencing machines. These machines decipher relatively shorts fragments about 500 bases long, but it is faster and cheaper than previously done. Computers piece the sequenced fragments like a puzzle to come up with the original genome. The shotgun approach is applied to previously mapped cloned DNA fragments, so it is known exactly where they are located on the genome, making assembly easier and more accurate.
The accuracy rate of the HGP is ensured at 99.99% or better. However, 0.01% allows for 3.2 million mistakes to be made. To ensure the highest accuracy possible, coverage of no more than 1 gap per 200 kb is sought after. With this, two categories of error will be considered: small-scale sequencing errors and large-scale structural errors. Small-scale sequencing errors are localized discrepancies between the DNA template and the sequence. These errors can be corrected through PCR amplification and re-sequencing. Large-scale structural errors are due to mapping problems (false joins or gaps), clonal aberrations (chimeras, deletions, co-ligations, or transposons), sequence assembly problems (false joins, gaps, or sequences assembled in the wrong orientation). Large-scale structural errors have a more serious impact than small-scale sequencing errors because the correct sequence is difficult to obtain in laboratories without the resources of a genome centre.
Today, the speed has increased tremendously and the costs are extremely lower from when the program began. It took the International Human Genome Project 4 years to produce the first billion base pairs and less than 4 months to produce the second billion base pairs. Today, approximately 20 million bases are sequenced and deposited in the public database each day, compared to 20 million bases during the entire year of 1997. The amount of sequencing done in 8 days today is equal to the amount done during the whole year of 1998. The costs have dropped from over $2 to less than 10 cents per “finished” base over the past two years. Although there is less staff and costs involved now, the HGP is advancing more than 30 times faster than it did during the first year of its operation.
Knowing the complete genetic makeup of a human or any other organism will radically change the face of science. When the complete DNA sequence becomes available to the public, it will revolutionize the way people use science. DNA sequencing results will help to analyze genetic variability to understand genetic differences and susceptibility to disease, gene regulation, understanding human evolution and discovering new genes to help diagnose, treat, and prevent disease. Genetic information can be applied in agriculture and food to create more efficient breeding and improve biomass-based energy systems. The information obtained will lead to the understanding of the laws, principles, and logic that govern how a living organism functions.
Although the HGP will provide the essential genetic makeup for a human it only comprises of 1% of the DNA sequencing that will be done in the next 5 years. DNA analysis will be performed with samples from microbes, mice, and other organisms. DOE will continue to create faster and cheaper means of DNA sequencing and analysis. Finally, all the information obtained will be stored in computer databases that will require sophisticated computational tools and resources. By studying similarities and differences in DNA sequences across species, we will begin to understand the real understanding of the biology of life.
1. Arms, Karen and Camp, Pamela S. Biology. Fourth edition. Philadelphia, Pennsylvania. 1995.
2. Hayes, Brian. Computing Science: The Invention of the Genetic Code. American Scientist. Online. Internet. January-February 1998. Available: <http://www.amsci.org/amsci/issues/Comsci98/compsci9801.html>
3. National Human Genome Research Institute, Division of Extramural Research. The Human Genome Project. Online. Internet. February 9, 2000. Available:
4. United States Department of Energy. DOE Investments that Made the Human Genome Project Possible, Reduced its Costs, Sped it up, etc. Online. Internet. April 13, 2000. Available. <http://www.sc.doe.gov/geno_res/DOEInvestmentsFS.pdf>
5. United States Department of Energy. DNA Sequencing: The Next Step in the Search for Genes. Online. Internet. February 19, 1999. Available: <http://www.llnl.gov/str/Ashworth.html>
6. United States Department of Energy. Next Steps in the Human Genome Project. Online. Internet. April 13, 2000. Available: <http://www.sc.doe.gov/geno_res/NEXTSTstep.pdf>
7. United States Department of Energy. Prepared Remarks for Energy Secretary Bill Richardson. Press Release. Online. Internet. April 13, 2000. Available: <http://www.sc.doe.gov/geno_res/press.pdf>
8. United States Department of Energy. Research & Development At US DOE National Labs. Sept 1997.
9. United States Department of Energy. To Know Ourselves: The Department of Energy and the Human Genome Project. July 1996: p. 4-5.
10. United States Department of Energy, Office of Biological and Environmental Research. A Vital Legacy: Biological and Environmental Research in the Atomic Age. September 1997: p. 15-16.
11.United States Department of Energy, Office of Biological and Environmental Research, Human Genome Program. The Science Behind the Human Genome Project: Understanding the Basics and How the HGP is Implemented. Online. Internet. Tuesday, March 28, 2000. Available: <http://www.ornl.gov/hgmis/resource/info.html>
12.Voet, D. and Voet, J. Biochemistry. Second Edition. Copyright 1995. John Wiley & Sons, Inc. New York.
Cite this page
This content was submitted by our community members and reviewed by Essayscollector Team. All content on this page is verified and owned by Essayscollector Team. All comments and user reviews are moderated by Essayscollector Team. In the case of any content-related problem, you can reach us through the report button.