Analyzing and Interpreting Intraspecific Variation

Which markers are best? This question has plagued mankind since the development of isozyme analysis techniques in the early 1960s. The problem is that different investigators seek different marker characteristics, and have different experimental objectives in mind when they select a technology. Reliability, experimental efficiency (the number of loci x the number of individuals scorable per day), experimental ease and cost per datapoint all contribute to the controversy. One of the most heavily weighted marker characteristics is the number of alleles a marker identifies per locus. The reason for this is simple: once you learn how to use a marker, you want to use it in lots of different contexts. If your marker identifies lots of different alleles across your population of interest, it'll be useful in many experiments. There are many classes of markers which provide a maximum of two alleles. SNPs typically fall into this category. Any +/- sort of marker (e.g. AFLP) has limited value because there are many ways to fail to detect a phenotype or an amplification product. Datapoint errors are especially high with markers of this sort, because you simply don't know whether you have the null allele or if your detection system failed. Markers with several potential states are generally more useful, although the low allele informativeness may be countered by a marker system which marks lots of loci simultaneously (a multiplexed assay).

The first genetic markers were morphological markers, now frequently called 'naked eye polymorphisms' (neps). Over 200 of these were identified, classified and mapped in barley(in a very rough sense) before the advent of molecular marker techniques. Several of these segregate in normal populations of barley, and remain useful. Generally useful neps include markers which appear to have no selective value, including rough/smooth awns, or long/short rachilla hairs. Many of the other 200 neps result in a reduction in plant vigor. Our first QTL analysis (Good, bad and untested ideas in plant qtl analysis, 1991, Blake, Lybeck and Hayes) utilized a parent with multiple (11) recessive neps- we used these as chromosomal anchor markers. Unfortunately, their phenotypic effects were so dramatic they skewed plant performance and were responsible for most of the detectable QTL effects in the population.

Molecular Markers Should be Selectively Neutral

Selective neutrality of molecular markers was one of the most debated observations of the isozyme era. Scientists were surprised to readily observe multiple 'electromorphs' of many enzymes, primarily because genetic theory held that genetic load (the overall depression of fitness of a population due to the presence of selectively negative alleles within it) would result in the elimination of species maintaining lots of selectively significant alleles. Further, these electromorphs were seemingly different enough that they should have some selective importance. The literature battle was waged for several years around 1970, and ended when DNA sequence information became commonplace. Most molecular markers are selectively neutral, some represent adaptively significant mutations.

The informativeness of a single locus marker can be evaluated as a function of the number of alleles present in your population and their relative abundance:

Polymorphism Information Content
 


 
 
 

where fi2 is the frequency2 of the ith allele.

If you have two alleles, each with a frequency of 0.5, then your PIC value is 1- 2 x .52, or 0.5

If you have two alleles, one with a frequency of 0.1 and one with a frequency of 0.9, then your PIC value is 1 - .01 - .81 = 0.18.

If you have four alleles, each with a frequency of 0.25, the PIC value is 1-4 x .252 = .75

With PIC values, the higher the more generally useful the marker.

Genetic Diversity Indices

Informative markers are useful for many applications. Among these is providing relative estimates of genetic diversity. There are several ways to look at diversity within a germplasm pool, and a fine discussion of a few approaches was provided by Dillman et al., 1997. Comparison of RFLP and morphological distances between maize inbred lines. Theor. Appl. Genet. 95:92-102. In this experiment Dillman et al. evaluated 145 maize inbreds and evaluated 100 RFLP markers against them using three enzymes (EcoR1, HindIII, EcoRV). These enzyme/probe combinations (referred to as EPCs) were scored against all of the inbreds, and out of the 300 possible EPCs 222 provided useful information. Nei's diversity index (see above) was computed for each. An important consequence of this effort is the observation by Alex Kahler that corn inbreds owned by different companies may be more closely related than their pedigrees would suggest.
 

Nei provided a straightforward method of estimating genetic distance among lines. This distance estimate is:
 

N = 1 - 2Nxy/(Nx + Ny)

Nei's indices look at whether or not a band is present on a gel. This is not equivalent to allele composition. If allele a has two restriction sites for EcoR1, while allele b has none, allele b will be scored as a positive product whenever it is present, while in lines with allele a, this allele will be scored as three positive events. Nei's index results in a lot of data redundancy. If you could look at alleles directly, the information gathered might contain less redundancy. To do this, you must be able to evaluate allelic variants from a single locus.
 
 
 

Isozymes and Storage Proteins

Isozymes are in many respects excellent markers. Once a system is working, all you need is a good tub of potato starch and a histological stain. They're cheap to evaluate, and often a small snip of a leaf is sufficient for analysis. Problem: Every isozyme is different from every other one. There is no such thing as a consistent technology with isozymes. Also, few isozyme systems were informative. At one time around 100 different enzyme activities could be evaluated in starch gels (see Tanksley's book on isozyme electrophoresis), but in barley the list of informative isozymes was limited to alcohol dehydrogenase, esterase, acid phosphatase, alpha amylase, beta amylase, beta glucanase and a few others.

Storage proteins (and seed proteins in general) are excellent markers. The major seed proteins in cereals are products of gene families which readily accumulate variation. The prolamin seed proteins of the grasses are the best varietal identification tools currently available- highly informative, easy to isolate and stable. Great markers. In barley three gene clusters encode the major prolamin gene families, the B, C and D hordeins on barley chromosome 5 (1H). A combination of factors- multiple copies, little selective value, internal repeat sequences all contribute to the accumulation of variation in these gene families. The B hordeins have a high PIC (near .9), the C hordeins are generally (depending on the population) around .6, and the D hordeins are at best .4. Within a narrow germplasm base, the D's are often 0.

Restriction Fragment Length Polymorphisms (RFLPs) are not remarkably polymorphic in general terms, but there's an infinite number of them. We select the clones which span or are adjacent to variable regions of the genome, and use them to effect. An interesting derivative of RFLP analysis is Restriction Landmark Genome Scanning (RLGS). Take good quality DNA and fill in the sheared ends with Klenow fragment. Digest with an 8-base cutter, and lable the cut ends with Klenow. Then digest these with a 6-base cutter and electrophorese the fragments in a long agarose tube gel. Pull the tube out, degrade the fragments with a third nuclease, and resolve the fragments on a slab gel. Conceptually, this isn't a bad idea. Practically it's a recipe for disaster. One sample per final gel, three electrophoresis steps, all radioactive. This means that you have to run 300 high quality gels to assay 100 individuals. However, the density of available markers might make the process worthwhile.
 

STSs- The general idea was: if you knew that an RFLP identified a polymorphism, maybe PCR could do the same thing. The problem with the idea:  STSs only look within a primed sequence, and primers often anneal at multiple locations. Sometimes these are useful, sometimes not. This topic has recently become an important one: STSs are the source of SNPs (single nucleotide polymorphisms).  I recommend reading our two publications of conversion of STSs to fluor-tagged SNPs.  MS #1 was part of Deven See's Master's thesis, while MS#2 was a general lab effort.  Our laboratory will look at variation at a few loci.
 

Microsatellites: Are some sequences more prone to accumulate variation than others? Forensic analysis is built around Jeffrey's observation that small direct repeat sequences accumulate variation with remarkable speed. Several scientists utilized an approach developed by Ostrander et al., 1995, to identify sequences carrying short direct repeat sequences (e.g. CACACACACACA). Jeffreys et al (1985) estimated a 2% mutation rate for minisatellite sequences. Although I believe this estimate to be a poor one, it's better than any available for microsatellites in crops. We should generate a good mutation rate estimate for microsatellites in cultivated plants.
 

RAPDs: The guys from DuPont did this to us. Take either one or two random 12 base primers (412 available), and use these against genomic DNA in a PCR reaction. Sometimes something gets amplified in one genotype which isn't in another. PIC values? Meaningless. This technique did more to slow the characterization of the genetics underlying useful variation in crops than any idea since 'Rain follows the plow'. Miserable technique, lousy reproducibility, leading nowhere. If you rely on this, you'd best get a job driving a truck.
 

AFLPs: Amplified Fragment Length Polymorphisms are the thinking man's version of RAPDs. Digest genomic DNA with a 6-cutter and a 4-cutter, and ligate on linker sequences. Amplify the whole gamish with primers against the linkers. To simplify the analysis, use a secondary amplification in which the primers have arbitrary two or three base overhangs. Use a lable on the 6-cutter primer, and detect product size polymorphisms on a sequencing gel. We'll be doing this in a few weeks in the lab. This couples the robustness of RLGS with the inherent simplicity of RAPDs. As demonstrated by the folks who cloned ml-o, this can be a remarkably useful technique.  We developed a pretty useful software package, Genographer, to deal with the data gathered through AFLP analysis.
 
 

Single Nucleotide Polymorphisms
Until recently, nobody in his right mind considered sequencing allelic variants. Larson et al (1996) did, in order to find STSs in locations which were useful.  This is now a growth area in our field.  Our group pioneered this labor-intensive effort in the small grains, producing several theses and many manuscripts from the attempt to utilize the most minute of mutations, the single base change, as markers.  Initially we utilized only those SNPs which were assayable using restriction endonucleases.  Recently, See et al (accepted with revision) developed a general approach toward the use of these mutations.  In human genetics, this has turned into an industry.

Crop Diversity, an overview
Wheat, barley, soybean and many other of our most important crops are most generally grown as inbred varieties.  Rice and maize are most generally grown as simple hybrids, although Xiao demonstrated that at least with rice this is due to plant breeders' ineffectiveness in bringing together unlinked desirable gene combinations, not to classic heterosis.  It's easy to argue that our current germplasm pools are genetically narrow.  Where will our genetic improvements come from in the future?

The USDA, the Vavilov Institute (Russia), and several European organizations have collected, catalogued and maintained the genetic diversity of the world for decades.  I strongly recommend going to the library and reading any of Nicolai Vavilov's books on the biogeography of crops.  We will obtain the genes for the future from our collections of the world's diversity.   They're really pretty amazing.