Conceptual Underpinnings of Map Construction

I. Two point analysis

Geneticists only look at genes for which variation can be found. Classical geneticists looked only for those genes which varied, and in varying contributed to observable variation. We often now call this class of gene a 'Naked eye polymorphism'. Mendel correctly inferred the existence and behavior of genes through meiosis through analysis of segregation. Two copies of each gene are normally found in most eukaryotes. If these copies differ in a measurable way, we may be able to determine in a family of related individuals whether they carry two identical versions of a gene or one of each. If we examine the population of genes which contributed to the genotypes of the evaluated individuals, we might expect the proportion of each gene variant in the parental gene pool to predict the frequency of each variant in the progeny population. At least that's what Gregor Mendel thought in 1865. Mendel utilized inbred garden pea (Pisum sativum) plants which differed for easily identified characters as parents. He crossed lines with purple or white flowers, produced round or wrinkled seed, green or yellow seeds, green or yellow pods and looked at the frequency with which these traits were found in the first generation after the cross (the F1 generation) or in the selfed progeny of the F1 generation, the F2 generation. If you start with completely inbred parents (as Mendel did) all of the F1s from such a cross were identical and exhibited the dominant version of the gene in question. In the F2 generation segregation was observed.
Parent Phenotype F1 Phenotype F2 Phenotype Ratio
Purple vs White Flowers All Purple 705 Purple, 224 White 3.15:1
Green vs Yellow Pods All Green 428 Green:152 Yellow 2.82:1
Round vs Wrinkled Seeds All Round 5747 Round: 1850 Wrinkled 2.96:1
Yellow vs Green Cotyledons All Yellow 6022 Yellow: 2001 Green 3.01:1




















Diagrammatically, this resolves to the following:

P1 P2

AA x aa

100% Aa F1

25%AA + 50%Aa =75%A- 25%aa F2

Mendel correctly interpreted these observations to infer that alternative forms of a gene (alleles) segregate into gametes at understandable frequencies, and that gametes unite to form zygotes randomly.

Somebody came up with the bright idea of calling this a 'monohybrid cross'. I've never liked this term. What it is supposed to imply is that it's a cross in which one gene is segregating (or segregation at one locus is being monitored). A dihybrid cross is one in which two segregating genes are monitored simultaneously. Mendel found that for the genes he evaluated (see above for examples) each gene segregated independently. Punnett (Bateson and Punnett) developed the 'Punnett Square', a handy graphical user interface which helps students understand how gametic gene frequencies resolve into whole organism phenotypic frequencies.

Take two plants: one has purple flowers (AA) and round seeds (BB); the other has white flowers (aa) and wrinkled seeds (bb). You make the cross and produce a purple flowered, round seeded F1 with the genotype AaBb. You then allow this plant to self-pollinate. Four types of gametes are produced with equal frequency: AB, Ab, aB, ab (each with a frequency of 1/4)

The Punnett Square
1/4 AB 1/4 Ab 1/4 aB 1/4 ab
1/4 AB 1/16 AABB 1/16 AABb 1/16 AaBB 1/16 AaBb
1/4 Ab 1/16 AABb 1/16 AAbb 1/16 AaBb 1/16 Aabb
1/4 aB 1/16 AaBB 1/16 AaBb 1/16 aaBB 1/16aaBb
1/4 ab 1/16 AaBb 1/16 Aabb 1/16 aaBb 1/16 aabb














From this you can look at the genotypic frequencies and translate these to phenotypic frequencies. In this case, 9/16 have round seeds and purple flowers, 3/16 wrinkled seeds and purple flowers, 3/16 round seeds and white flowers, 1/16 wrinkled seeds and white flowers.

This was Mendel's demonstration that genes segregated independently. This happens when two genes lie on different chromosomes, or lie sufficiently far apart on a chromosome that the frequency with which recombination occurs decouples them. Although noticed in 1902, genetic linkage was largely elucidated in Morgan's 'fly lab' at Columbia from 1910 through the 1920s. Genetic linkage is the tendency for genes which are close to one another on a chromosome to be more frequently transmitted together to progeny than genes which are far apart or on different chromosomes. Meiotic recombination and its companion 'Interference' are responsible for this tendency. Linkage is estimated by determining how far deviated from expectation are gamete frequencies. In an F1 dihybrid cross we expect 1/4 AB, 1/4 Ab, 1/4 aB, 1/4 ab. Allard (1957) provided convenient tables which provide 'maximum likelihood' estimates of linkages among pairs of genes (two-point linkage estimates) from segregation in F2 populations showing different types of gene action.

While I was a graduate student, it was impossible to reasonably consider cloning an actual gene in which any reasonable person might be interested. Goldman and his students produced the estimate that about 100,000 genes were expressed during the lifetime of an average plant. The estimates of the amount of single copy DNA available for coding (see Bennett's and Bendich's papers) supported this contention. While perhaps off by a factor of three, it's still a decent estimate. While I was a student perhaps 100 plant genes were cloned, sequenced and generally understood. Currently the 40,000 genes in the Arabidopsis genome have been mapped, cloned and sequenced. Still, less than 1000 are well understood. During your lifetime it will become increasingly more reasonable to attempt to clone and characterize genes which are really interesting.

Genes that contribute to phenotypic variation are interesting. Genes that define the adaptational differences among genotypes within a species, and those which are responsible for the differences we see among species interest me even more. While we have no easy way to know when we have cloned the genes responsible for variation in developmental patterns, we can now effectively determine their chromosomal location, and through careful management of populations, effectively determine the scope and magnitude of their effects. This is what QTL analysis is all about.

There are two biological process scientists utilize to identify genes of interest- mutation and genetic linkage. We will spend time on the types of mutagenesis technologies that have been used to saturate genomes with gene-interrupting genetic insertions later in the course.  If you know which previously characterized genetic markers are close to a gene in which you are interested, you have a place to start. Linkage is important, and how we measure and interpret it is likewise important.

There is a simple, first principles approach to estimating linkage if you know something about the relationships between genotype and phenotype in a segregating population.  While not as accurate as a maximum likelihood estimator, it's something you can do in the field with a pencil.  As a grad student I thought that everyone knew how to do this, and didn't think it interesting.  Theor. Appl. Genet. published it the year after I graduated.

The place we start is at the 'two point test'. In this analysis all of the genetic markers which have been assayed in a population of individuals are contrasted in a pairwise analysis with one another. The data output looks like the results of a half-diallele analysis. A two-point analysis provides the framework to cluster genetic markers into 'linkage groups'. The experienced geneticist can utilize a two point clustering to produce approximate maps.

II. Three point analysis

Three point analysis asks the question "if you have three genetic markers, a,b and c, what is their most likely order? The test then estimates the total number of crossover events for each possible outcome. The lowest estimate wins. Through grouping of markers by a two point analysis and ordering the tightly grouped clusters by three point analysis complete linkage maps can be produced. See two point.

III. Complete Map Construction

Recombination frequency estimates provide the basis for linkage maps. J.B.S. Haldane recognized that if recombination events are independent, then the frequency with which two form in a chromosomal region should be the square of the recombination frequency. Of course, if two recombination events occur between two genetic markers no recombination is observed. This means that double crossovers lessen the apparent recombination frequency between pairs of markers. If you have a third marker between the two you are considering, you may directly observe double crossovers.

IV. Knowing the limitations of linkage maps

Linkage maps tell us a lot about the relationship between genetic markers and the genes which are near them. Linkage maps are limited by the error rate of the data in use. Their value is also dependent upon the ratio between the physical distance between markers and the frequency with which recombination occurs. They are further limited by the regional variation in interference which has been observed in many organisms. Of these problems, error rate is the easiest to manage. Error rates are generally estimated through the use of randomization and replication. Unfortunately, it's a rare genetic analysis in which replication is employed. Maps generally require tens of thousands of datapoints, and few scientists are willing to replicate datasets of this size and cost. Consequently, the frequency of apparent double crossovers is the most commonly employed estimator of error rate. We assume that if three markers are tightly linked, then the frequency of double crossovers in the two interval region should be very low. This assumes interference is positive, and that it is uniform over the genome. Both assumptions may reasonably be challenged.

V. Constructing you own map

Each student will be responsible for taking one 'group' from the Steptoe/Morex doubled haploid population and generating a linkage map. The primary software you use will be Mapmaker, a program developed at the Whitehead Institute (MIT) by Steve Lincoln under the direction of Eric Lander.

Generate your map. Your objectives are 1) to keep as many markers in the map as possible 2) to eliminate all markers which fail to map well and 3) to minimize the total recombinational length of the chromosome.

Analyze your map. What is the frequency of double crossovers around individual markers in your map? Does this suggest how you might improve your map? Are your markers uniformly distributed over your map? Should they be uniformly distributed?

Write your report. Contrast your map with the map of Kleinhofs et al.(1993). Are there significant differences?

Measuring Phenotypic Variance and Mapping QTL

Each student will take the map and field book and measure some observable character which seems to vary from plot to plot in the Steptoe/Morex field experiment. You will then do an analysis of variance to determine whether variation among lines was observed, or whether the variation you observed was simply 'random noise'. You will take the consensus mapping dataset produced by yourself and your colleagues, add your QTL data and do a QTL analysis using Mapmaker-QTL. We will evaluate these results on Thursday, September 11 in class. I will try to put plot tags on the nursery this afternoon, and will be in the field and available for the field version of 'office hours' daily through this week.

Objectives:

1) Do an ANOVA, and from the analysis estimate broad sense heritability for your trait

2) Do a QTL analysis with your data and find marker x phenotype interactions

3) Estimate the amount of genotypic variance you can attribute to each gene

4) (extra credit) Try to develop a model which will estimate the genotypic value for each line in the population. Go to the Steptoe/Morex master dataset, and find other data for the character you've measured. See how well your model works for other data gathered at Bozeman, and for data gathered elsewhere.

5) (extra credit) Look for epistatic interactions.

Output

I want a written report in WordPerfect or Word on a floppy to me by the end of the fifth week.

Back to Table of Contents