ASPECTS OF MOLECULAR BIOLOGY & BIOINFORMATICS RELEVANCE IN INDUSTRIAL MICROBIOLOGY & BIOTECHNOLOGY CONTD..
The Polymerase Chain Reaction (PCR) is a technology used to amplify small amounts of
DNA. The PCR technique was invented in 1985 by Kary B. Mullis while working as a
chemist at the Cetus Corporation, a biotechnology firm in Emeryville, California. So
useful is this technology that Mullis won the Nobel Prize for its discovery in 1993, eight
years later. It has found extensive use in a wide range of situations, from the medical
diagnosis to microbial systematics and from courts of law to the study of animal
The requirements for PCR are:
a. The DNA or RNA to be amplified
b. Two primers
c. The four nucleotides found in the nucleic acid,
d. A heat stable a thermostable DNA polymerase derived from the thermophilic
bacterium, Thermus aquaticus, Taq polymerase
The Primer: A primer is a short segment of nucleotides which is complementary to a
section of the DNA which is to be amplified in the PCR reaction.
Primers are anneal to the denatured DNA template to provide an initiation site for the
elongation of the new DNA molecule. For PCR, primers must be duplicates of nucleotide
sequences on either side of the piece of DNA of interest, which means that the exact order
of the primers’nucleotides must already be known. These flanking sequences can be
constructed in the laboratory or purchased from commercial suppliers.
The Procedure: There are three major steps in a PCR, which are repeated for 30 or 40 cycles.
This is done on an automated cycler, which can heat and cool the tubes with the reaction
mixture at specific intervals.
a. Denaturation at 94°C
The unknown DNA is heated to about 94°C, which causes the DNA to denature and the
paired strands to separate.
b. Annealing at 54°C
A large excess of primers relative to the amount of DNA being amplified is added and the
reaction mixture cooled to allow double-strands to anneal; because of the large excess of
primers, the DNA single strands will bind more to the primers, instead of with each other.
c.Extension at 72°C
This is the ideal working temperature for the polymerase. Primers that are on positions
with no exact match, get loose again (because of the higher temperature) and do not give
an extension of the fragment. The bases (complementary to the template) are coupled to
the primer on the 3' side (the polymerase adds dNTP’s from 5' to 3', reading the template
from 3' to 5' side, bases are added complementary to the template).
d. The Amplification:The process of the amplification is shown in Figure.
Some Applications of PCR in Industrial Microbiology and Biotechnology
PCR is extremely efficient and simple to perform. It is useful in biotechnology in the
(a) to generate large amounts of DNA for genetic engineering, or for sequencing, once
the flanking sequences of the gene or DNA sequence of interest is known;
(b) to determine with great certainty the identity of an organism to be used in a
biotechnological production, as may be the case when some members of a group of
organisms may include some which are undesirable. A good example would be
among the acetic acid bacteria where Acetobacter xylinum would produce slime
rather acetic acid which Acetobacter aceti produces.
(c) PCR can be used to determine rapidly which organism is the cause of
contamination in a production process so as to eliminate its cause, provided the
primers appropriate to the contaminant is available.
The availability of complete genomes from many organisms is a major achievement of
biology. Aside from the human genome, the complete genomes of many microorganisms
have been completed and are now available at the website of The Institue for Genomic
Research (TIGR), a nonprofit organization located in Rockville, MD with its website at
www.tigr.org. At the time of writing, TIGR had the complete genome of 294
microorganisms on its website (268 bacteria, 23 Archae, and 3 viruses). The major
challenge is now to decipher the biological function and regulation of the sequenced
genes. One technology important in studying functional microbial genomics is the use of
Microarrays are microscopic arrays of large sets of DNA sequences that have been
attached to a solid substrate using automated equipment. These arrays are also referred
to as microchips, biochips, DNA chips, and gene chips. It is best to refer to them as
microarrays so as to avoid confusing them with computer chips.
DNA microarrays are small, solid supports onto which the sequences from thousands
of different genes are immobilized at fixed locations. The supports themselves are
usually glass microscope slides; silicon chips or nylon membranes may also be used. The
DNA is printed, spotted or actually directly synthesized onto the support mechanically
at fixed locations or addresses. The spots themselves can be DNA, cDNA or
The process is based on hybridization probing. Single-stranded sequences on the
microarray are labeled with a fluorescent tag or fluorescein, and are in fixed locations on
the support. In microarray assays an unknown sample is hybridized to an ordered array
of immobilized DNA molecules of known sequence to produce a specific hybridization
pattern that can be analyzed and compared to a given standard. The labeled DNA strand
in solution is generally called the target, while the DNA immobilized on the microarray is
the probe, a terminology opposite that used in Southern blot. Microarrays have the
following advantages over other nucleic acid based approaches:
a. High through-put: thousands of array elements can be deposited on a very small
surface area enabling gene expression to be monitored at the genomic level. Also
many components of a microbial community can be monitored simultaneously in
a single experiment.
b. High sensitivity: small amounts of the target and probe are restricted to a small
area ensuring high concentrations and very rapid reactions.
c. Differential display: different target samples can be labeled with different
fluorescent tags and then hybridized to the same microarray, allowing the
simultaneous analysis of two or more biological samples.
d. Low background interference: non-specific binding to the solid surface is very low
resulting in easy removal of organic and fluorescent compounds that attach to
microarrays during fabrication.
e. Automation: microarray technology is amenable to automation making it
ultimately cost-effective when compared with other nucleic acid technologies.
Applications of Microarray Technology
Microarray technology is still young but yet it has found use in a some areas which have
importance in microbiology in general as well as in industrial microbiology and
biotechnology, including disease diagnosis, drug discovery and toxicological research.
Microarrays are particularly useful in studying gene function. A microarray works by
exploiting the ability of a given mRNA molecule to bind specifically to, or hybridize to,
the DNA template from which it originated. By using an array containing many DNA
samples, it is possible to determine, in a single experiment, the expression levels of
hundreds or thousands of genes within a cell by measuring the amount of mRNA bound
to each site on the array. With the aid of a computer, the amount of mRNA bound to the
spots on the microarray is precisely measured, generating a profile of gene expression in
the cell. It is thus possible to determine the bioactive potential of a particular microbial
metabolite as a beneficial material in the form of a drug or its deleterious effect.
When a diseased condition is identified through microarray studies, experiments can
be designed which may be able to identify compounds, from microbial metabolites or
other sources, which may improve or reverse the diseased condition.
SEQUENCING OF DNA:
Sequencing of Short DNA Fragments:
DNA sequencing is the determination of the precise sequence of nucleotides in a sample
of DNA.Two methods developed in the mid-1970s are available: the Maxim and Gilbert
method and the Sanger method. Both methods produce DNA fragments which are
studied with gel electrophoresis. The Sanger method is more commonly used and will be
discussed here. The Sanger method is also called the dideoxy method, or the enzymic
method. The dideoxy method gets its name from the critical role played by synthetic
analogues of nucleotides that lack the -OH at the 3' carbon atom (star position):
dideoxynucleotide triphosphates (ddNTP) . When (normal) deoxynucleotide triphosphates (dNTP) are used the DNA strand continues to grow, but when the dideoxy
analogue is incorporated, chain elongation stops because there is no 3' -OH for the next
nucleotide to be attached to. For this reason, the dideoxy method is also called the chain
For Sanger sequencing, a single strand of the DNA to be sequenced is mixed with a
primer, DNA polymerase I, an excess of normal nucleotide triphosphates and a limiting
(about 5%) of the dideoxynucleotides labeled with a fluorescent dye, each ddNTP being
labeled with a different fluorescent dye color. This primer will determine the starting
point of the sequence being read, and the direction of the sequencing reaction. DNA
synthesis begins with the primer and terminates in a DNA chain when ddNTP is
incorporated in place of normal dNTP. As all four normal nucleotides are present, chain
elongation proceeds normally until, by chance, DNA polymerase inserts a dideoxy
nucleotide instead of the normal deoxynucleotide. The result is a series of fragments of
varying lengths. Each of the four nucleotides is run separately with the appropriate
ddNTP. The mix with the ddCTP produces fragments with C (cytosine); that with ddTTP
(thymine) produces fragments with T terminals etc. The fluorescent strands are separated
from the DNA template and electrophoresed on a polyacrylamide gel to separate them
according their lengths. If the gel is read manually, four lanes are prepared, one for each
of the four reaction mixes. The reading is from the bottom of the gel up, because the
smaller the DNA fragment the faster it is on the gel. A picture of the sequence of the
nucleotides can be read from the gel . If the system is automated, all four are mixed and electrphoresced together. As the ddNTPs are of different colors a scanner can
scan the gel and record each color (nucleotide) separately. The sanger method is used for
relatively short fragments of DNA, 700 -800 nucleotides. Methods for larger DNA
fragments are described below.
Sequencing of Genomes or Large DNA Fragments
The best example of the sequencing of a genome is perhaps that of the human genome,
which was completed a few years ago. During the sequencing of the human genome, two
approaches were followed: the use of bacterial artificial chromosomes (BACs) and the
short gun approach.
Use of bacterial artificial chromosomes (BACs)
The publically-funded Human Genome Project, the National Institutes of Health and the
National Science Foundation have funded the creation of ‘libraries’of BAC clones. Each
BAC carries a large piece of human genomic DNA of the order of 100-300 kb. All of these
BACs overlap randomly, so that any one gene is probably on several different
overlapping BACs. Those BACs can be replicated as many times as necessary, so there is
a virtually endless supply of the large human DNA fragment. In the publically-funded
project, the BACs are subjected to shotgun sequencing(see below) to figure out their
sequence. By sequencing all the BACs, we know enough of the sequence in overlapping
segments to reconstruct how the original chromosome sequence looks.
Use of the shot-gun approach:
An innovative approach to sequencing the human genome was pioneered by a privatelyfunded sequencing project, Celera Genomics. The founders of this company realized that
it might be possible to skip the entire step of making libraries of BAC clones. Instead, they
blast apart the entire human genome into fragments of 2-10 kb and sequenced them. The
challenge was to assemble those fragments of sequence into the whole genome sequence.
It was like having hundreds of 500-piece puzzles, each being assembled by a team of
puzzle experts using puzzle-solving computers. Those puzzles were like BACs - smaller
puzzles that make a big genome manageable. Celera threw all those puzzles together into
one room and scrambled the pieces. They, however, had scanners that scan all the puzzle
pieces and used powerful computers to fit the pieces together.
THE OPEN READING FRAME AND THE IDENTIFICATION OF GENES:
Regions of DNA that encode proteins are first transcribed into messenger RNA and then
translated into protein. By examining the DNA sequence alone we can determine the
putative sequence of amino acids that will appear in the final protein. In translation
codons of three nucleotides determine which amino acid will be added next in the
growing protein chain. The start codon is usually AUG, while the stop codons are UAA,
UAG, and UGA. The open reading frame (ORF) is that portion of a DNA segment which
will putatively code for a protein; it begins with a start codon and ends with a stop codon.
Once a gene has been sequenced it is important to determine the correct open reading
frame. Every region of DNA has six possible reading frames, three in each direction
because a codon consists of three nucleotides. The reading frame that is used determines
which amino acids will be encoded by a gene. Typically only one reading frame is used in
translating a gene (in eukaryotes), and this is often the longest open reading frame. Once
the open reading frame is known the DNA sequence can be translated into its
corresponding amino acid sequence.
For example, the sequence of DNA in Fig(following) can be read in six reading frames. Three
in the forward and three in the reverse direction. The three reading frames in the forward
direction are shown with the translated amino acids below each DNA sequence. Frame
1starts with the ‘a’, Frame 2with the ‘t’and Frame 3with the ‘g’. Stop codons are
indicated by an ‘*’in the protein sequence. The longest ORF is in Frame 1.
Genes can be identified in a number of ways, which are discussed below.
i. Using computer programs
As was shown above, the open reading frame (ORF) is deduced from the start and stop
codons. In prokaryotic cells which do not have many extrons (intervening non-coding
regions of the chromosome), the ORF will in most cases indicate a gene. However it is
tedious to manually determine ORF and many computer programs now exist which will
scan the base sequences of a genome and identify putative genes.
In scanning a genome or DNA sequence for genes (that is, in
searching for functional ORFs), the following are taken into account in the computer
a. usually, functional ORFs are fairly long and are do not usually contain less than
100 amino acids (that is, 300 amino acids);
b. if the types of codons found in the ORF being studied are also found in known
functional ORFs, then the ORF being studied is likely to be functional;
c. the ORF is also likely to be functional if its sequences are similar to functional
sequences in genomes of other organisms;
d. in prokaryotes, the ribosomal translation does not start at the first possible (earliest
5’) codon. Instead it starts at the codon immediately down stream of the ShineDalgardo binding site sequences. The Shine-Dalgarno sequence is a short
sequence of nucleotides upstream of the translational start site that binds to ribosomal RNA and thereby brings the ribosome to the initiation codon on the
mRNA. The computer program searches for a Shine-Dalgardo sequence and
finding it helps to indicate not only which start codon is used, but also that the ORF
is likely to be functional.
e. if the ORF is preceded by a typical promoter (if consensus promoter sequences for
the given organism are known, check for the presence of a similar upstream region)
f. if the ORF has a typical GC content, codon frequency, or oligonucleotide
composition of known protein-coding genes from the same organism, then it is
likely to be a functional ORF.
ii. Comparison with Existing Genes
Sometimes it may be possible to deduce not only the functionality or not of a gene (i.e. a
functional ORF), but also the function of a gene. This can done by comparing an
unknown sequence with the sequence of a known gene available in databases such as
The Institute for Genomic Research (TIGR) in Maryland.
CITED BY Kamal Singh Khadka
Msc Microbiology, TU.
Assistant Professor in Pokhara University, Pokhara Bigyan Thata Prabidhi Campus, PNC, LA, NA.