DNA Sequencing Methods
In the Sanger method, the DNA strand to be analyzed is used as a template and DNA polymerase is used, in a PCR reaction, to generate complementary strands using primers. Four different PCR reaction mixtures are prepared, each containing a certain percentage of dideoxynucleoside triphosphate (ddNTP) analogs to one of the four nucleotides (ATP, CTP, GTP or TTP).
Various DNA Sequencing methods.
1. Maxam – Gilbert sequencing
2. Chain-termination methods
3. Dye-terminator sequencing
4. Automation and sample preperation
5. Large scale sequencing strategies 6. New sequencing methods.
1. Maxam – Gilbert sequencing
In 1976-1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases. The method requires radioactive labelling at one end and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportions of one or two of the four nucleotide based in each of four reactions (G,A+G, C, C+T). Thus a series of labelled fragments is generated,from the radiolabelled end to the first ‘cut’ site in each molecule. The fragments in the four reactions are arranged side by side in gel electrophoresis for size separation. To visualize the fragments,the gel is exposed to X-ray film for autoradiography,yielding a series of dark bands each corresponding to a radiolabelled DNA fragment,from which the sequence may be inferred.
2. Chain-termination method
The chain terminator method is more efficient and uses fewer toxic chemicals and lower amount of radioactivity than the method of Maxam and Gilbert. The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators. The chain termination method requires a single-stranded DNA template,a DNA primer,a DNA polymerase, radioactively or fluorescently labelled nucleotides,and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions,containing all four of the standard deoxynucleotides(dATP, dGTP, dCTP, dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotide (ddATP, ddGTP, ddCTP, ddTTP) which are the chain terminating nucleotides, lacking a 3’-OH group required for the formation of a phosphodiester bond between two nucleotides,thus terminating DNA strand extension and resulting in DNA fragments of varying length.
Fig. Part of a radioactively labelled sequencing gel
The newly synthesized and labelled DNA fragments are heat denatured , and separated by size by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of the four individual lanes(lanes A, T, G,C), the DNA bands are then visualized by autoradiography or UV light,and the DNA sequence can be directly read off the X-ray film or gel image. A dark band in a lane indicates a DNA fragment that is result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative position of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence. The technical variations of chain termination sequencing include tagging with nucleotides containing radioactive phosphorus for labelling, or using a primer labelled at the 5’ end with a fluorescent dye. Dye- primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. Chain termination methods have greatly simplified DNA sequencing. Limitations include non-specific binding of the primer to the DNA,affecting accurate read-out of the DNA sequence,and DNA secondary structures affecting the fidelity of the sequence.
3.1 Dye-terminator sequencing
Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs,which permits sequencing in a single reaction,rather than four reactions as in the labelled- primer method. In dye- terminator sequencing ,each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes,each of which with different wavelengths of fluorescence and emission. Owing to its greater expediency and speed,dye terminator sequencing is now the mainstay in automated sequencing. Its limitation include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment,resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram. The common challenges of DNA sequencing include poor quality in the first 15-40 bases of the sequence and deteriorating quality of sequencing traces after 700-900 bases.
Fig. Sequence ladder by radioactive sequencing compared to fluorescent peaks
4 Automation and sample preparation
Automated DNA sequencing instruments (DNA sequencers) can sequence upto 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size seperation,detection and recording of dye fluorescence,and data output as fluorescent peak trace chromatograms. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programmes score the quality of each peak and remove lowquality base peaks (generally located at the ends of the sequence).
5. Large scale sequencing strategies
Current methods can directly sequence only relative short (300-1000 nucleotides long) DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. Large scale sequencing aims at sequencing very long DNA pieces,such as whole chromosomes. It consist of cutting (with restriction enzymes)or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA is cloned into a DNA vector, and amplified in E.coli. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long,contiguous sequence. This method does not require any pre- existing information about the sequence of the DNA and is reffered to as de novo sequencing. Gaps in the assembled sequence may be filled by primer walking. The different strategies have different tradeoffs in speed and accuracy.
Fig. Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping regions.
6. New sequencing methods
The high demand for low-cost sequencing has driven the development of high- throughput sequencing technologies that parallelize the sequencing process,producing thousands or millions of sequences at once. High-throughput sequencing technologies are intended to lower the cost of DNA sequencing . Molecular detection method are not sensitive enough for single molecule sequencing, so most approaches use an in vitro cloning step to amplify individual DNA molecules. In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single chip (appoximately 100cm in diameter) thus reducing the reagent usage as well as cost. In some instances researchers have shown that they can increase the throughput of conventional sequencing through the use of microchips.
7. High throughput sequencing
The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods.
6.1 Lynx therapeutics’ massively parallel signature sequencing (MPSS)
The first of the “next-generation” sequencing technologies, MPSS was developed in the 1990s at Lynx Therapeutics, a company founded in 1992 by Sydney Brenner and Sam Eletr. MPSS is an ultra high throughput sequencing technology. When applied to expression profile, it reveal almost every transcript in the sample and provide its accurate expression level. MPSS was a bead-based method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides; this method made it susceptible to sequence-specific bias or loss of specific sequences. However, the essential properties of the MPSS output were typical of later “next-gen” data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing cDNA for measurements of gene expression levels. Lynx Therapeutics merged with Solexa in 2004, and this company was later purchased by Illumina.
6.2 Polony sequencing
It is an inexpensive but highly accurate multiplex sequencing technique that can be used to read millions of immobilized DNA sequences in parallel. This techniques was first developed by Dr. George Church in Harvard Medical college. It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy of > 99.9999% and a cost approximately 1/10 that of Sanger sequencing.
A parallelized version of pyrosequencing was developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picolitre-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other.
6.4 Illumina (Solexa) sequencing
Solexa developed a sequencing technology based on dye terminators. In this, DNA molecule are first attached to primers on a slide and amplified, this is known as bridge amplification. Unlike pyrosequencing, the DNA can only be extended one neucleotode at a time. A camera takes images of the fluorescently labeled nucleotides, then the dye along with the terminal 3′ blocker is chemically removed from the DNA, allowing the next cycle.
6.5 SOLiD sequencing
The technology for sequencing used in ABISolid sequencing is oligonucleotide ligation and detection. In this, a pool of all possible oligonucleotides of fixed length are labelled according to the sequenced position. This sequencing results to the sequences of quantities and lengths comparable to illumine sequencing.
6.6 DNA nanoball sequencing
It is high throughput sequencing technology that is used to determine the entire genomic sequence of an organisms. The method uses rolling circle replication to amplify fragments of genomic DNA molecules. This DNA sequencing allows large number of DNA nanoballs to be sequenced per run and at low reagent cost compared to other next generation sequencing platforms. However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult. This technology has been used for multiple genome sequencing projects and is scheduled to be used for more.
6.7 Helioscope(TM) single molecule sequencing
Helioscope sequencing uses DNA fragments with added polyA tail adapters, which are attached to the flow cell surface. The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labeled nucleotides. The reads are performed by the Helioscope sequencer. The reads are short, up to 55 bases per run, but recent improvemend of the methodology allowes more accurate reads of homopolymers and RNA sequencing.
6.8 Single molecule SMRT(TM) sequencing
SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesisd in so calles zero-mode wave-guides (ZMWs) – small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide at its incorporation into the DNA strand, leaving an unmodified DNA strand. The SMTR technology allows detection of nucleotide modifications. This happens through the observation of polymerase kinetics. This approach allows reads of 1000 nucleotides.
6.9 Single molecule real time (RNAP) sequencing
This method is based on RNA polymerase (RNAP), which is attached to a polystyrene bead, with distal end of sequenced DNA is attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads in closer and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on the four readouts with lowered concentrations of each of the four nucleotide types.
7. Other sequencing technologies
Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labelled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced. Mass spectrometry may be used to determine mass differences between DNA fragments produced in chain-termination reactions.
Some important applications of DNA sequencing are :
1. To analyse any protein structure and function we must have the knowledge of its primary structure i.e its DNA sequence.
2. With its study we can understand the function of a specific sequence and the sequence responsible for any disease.
3. With the help of comparative DNA sequence study we can detect any mutation.
4. Kinship study.
5. DNA fingerprinting.
6. By knowing the whole genome sequence, Human genome project get completed.
The main problem with sequencing is its intactness. If we perform the sequencing of same sample with different methods the result may be different so we should have to do it in such a manner that atleast 40-50% sequence must be same of similar sample.
Benchmarks in DNA sequencing
- 1953 Discovery of the structure of the DNA double helix.
- 1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.
- 1975 The first complete DNA genome to be sequenced is that of bacteriophage φX174
- 1977 Allan Maxam and Walter Gilbert publish “DNA sequencing by chemical degradation”. Fred Sanger, independently, publishes “DNA sequencing by enzymatic synthesis”.
- 1980 Fred Sanger and Wally Gilbert receive the Nobel Prize in Chemistry
- EMBL-bank, the first nucleotide sequence repository, is started at the European Molecular Biology Laboratory
- 1982 Genbank starts as a public repository of DNA sequences.
- Andre Marion and Sam Eletr from Hewlett Packard start Applied Biosystems in May, which comes to dominate automated sequencing.
- Akiyoshi Wada proposes automated sequencing and gets support to build robots with help from Hitachi.
- 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.
- 1985 Kary Mullis and colleagues develop the polymerase chain reaction, a technique to replicate small fragments of DNA
- 1986 Leroy E. Hood’s laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine.
- 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.
- Walter Gilbert leaves the U.S. National Research Council genome panel to start Genome Corp., with the goal of sequencing and commercializing the data.
- 1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents (US)/base). 12 DNA Sequencing – Methods and Applications
- Barry Karger (January), Lloyd Smith (August), and Norman Dovichi (September) publish on capillary electrophoresis.
- 1991 Craig Venter develops strategy to find expressed genes with ESTs (Expressed Sequence Tags).
- Uberbacher develops GRAIL, a gene-prediction program.
- 1992 Craig Venter leaves NIH to set up The Institute for Genomic Research (TIGR).
- William Haseltine heads Human Genome Sciences, to commercialize TIGR products.
- Wellcome Trust begins participation in the Human Genome Project.
- Simon et al. develop BACs (Bacterial Artificial Chromosomes) for cloning.
- First chromosome physical maps published:
- Page et al. – Y chromosome;
- Cohen et al. chromosome 21.
- Lander – complete mouse genetic map;
- Weissenbach – complete human genetic map.
- 1993 Wellcome Trust and MRC open Sanger Centre, near Cambridge, UK.
- The GenBank database migrates from Los Alamos (DOE) to NCBI (NIH).
- 1995 Venter, Fraser and Smith publish first sequence of free-living organism, Haemophilus influenzae (genome size of 1.8 Mb).
- Richard Mathies et al. publish on sequencing dyes (PNAS, May).
- Michael Reeve and Carl Fuller, thermostable polymerase for sequencing.
- 1996 International HGP partners agree to release sequence data into public databases within 24 hours.
- International consortium releases genome sequence of yeast S. cerevisiae (genome size of 12.1 Mb).
- Yoshihide Hayashizaki’s at RIKEN completes the first set of full-length mouse cDNAs.
- ABI introduces a capillary electrophoresis system, the ABI310 sequence analyzer.
- 1997 Blattner, Plunkett et al. publish the sequence of E. coli (genome size of 5 Mb)
- 1998 Phil Green and Brent Ewing of Washington University publish “phred” for interpreting sequencer data (in use since ‘95).
- Venter starts new company “Celera”; “will sequence HG in 3 yrs for $300m.”
- Applied Biosystems introduces the 3700 capillary sequencing machine.
- Wellcome Trust doubles support for the HGP to $330 million for 1/3 of the sequencing.
- NIH & DOE goal: “working draft” of the human genome by 2001.
- Sulston, Waterston et al finish sequence of C. elegans (genome size of 97Mb).
- 1999 NIH moves up completion date for rough draft, to spring 2000.
- NIH launches the mouse genome sequencing project.
- First sequence of human chromosome 22 published.
- 2000 Celera and collaborators sequence fruit fly Drosophila melanogaster (genome size of 180Mb) – validation of Venter’s shotgun method. HGP and Celera debate issues related to data release.
- HGP consortium publishes sequence of chromosome 21.
- HGP & Celera jointly announce working drafts of HG sequence, promise joint publication. DNA Representation 13
- Estimates for the number of genes in the human genome range from 35,000 to 120,000. International consortium completes first plant sequence, Arabidopsis thaliana(genome size of 125 Mb).
- 2001 HGP consortium publishes Human Genome Sequence draft in Nature (15 Feb).
- Celera publishes the Human Genome sequence.
- 2005 420,000 VariantSEQr human resequencing primer sequences published on new NCBI Probe database.
- 2007 For the first time, a set of closely related species (12 Drosophilidae) are sequenced, launching the era of phylogenomics.
- Craig Venter publishes his full diploid genome: the first human genome to be sequenced completely.
- 2008 An international consortium launches The 1000 Genomes Project, aimed to study human genetic variability.
- 2008 Leiden University Medical Center scientists decipher the first complete DNA sequence of a woman.