Just another CropScience weblog

Some general advice for implementing genomic selection into a breeding program

General procedure to conduct genomic selection

  • Decide which trait should be predicted. Is it a quantitative (like yield) or qualitative trait (like a disease). If it is a quantitative trait you may need multi-year phenotypic data with a moderate to high heritability (h² > 0.4) and a large set of genotypes (n > 200). If it is a qualitative trait you may think about conducting a QTL project for identifying core markers that influence the respective trait (like Rutkoski et al., 2012).
  • Decide in which genotype panel you want to predict the trait: Hybrids / Female pool / Male pool
  • Decide how many and which genotypes you want to genotype and phenotype. It is very important that the genotypes are tracked in time so that nor or only a few genotypic and/or phenotypic data points are missing.
  • Decide how to phenotype the trait of interest: This is simple but crucial. If the trait is phenotyped at several sites, make sure that it is noted in the same way and in all field repetitions (nRep >1).
  • Decide which field design to use: In case you have to phenotype a lot of genotypes in the field, it might be necessary to use an un-replicated field design. Use control varieties and adjust for row and column effects (Williams et al., 2011). To estimate the heritability of the trait it is necessary that the trait is evaluated in at least 3 trials.
  • Phenotype / genotype: usually it is best to first phenotype and then to genotype, because you know how many genotypes were phenotyped, how heritable the trait is and how the different trials are correlated. Nevertheless, it is most likely more efficient to increase breeding gains by genotyping and phenotyping in parallel. This however implies that you need to track genotypes and be sure that the phenotyping is conducted reliably according to the protocol you defined.
  • Decide which marker system you want to use for genotyping: At the moment you can genotype wheat with SNP and SSR markers. Regarding the SNP markers, a 90K chip is available, resulting in 90.000 markers genotyped of which around 30.000 will be useful for estimating marker affects. Up to now there is no genomic map of wheat available, which limits genomic selection a bit (i.e. estimation of LD or building haplotypes), but the construction of the map is on its way. Apart of the 90K chip there is also possible to analyze samples with the GBS system (Elshire et al., 2011). A lot of data will be produced in the order of 500K and the person using these data has to know how to impute missing data and how to handle this amount of marker data over time (Rutkoski et al., 2013).
  • Construct the genotype matrix: From the lab you receive data containing the allele call of each marker i.e. A or G, C or T and so on. This has to be transformed into a 1 and 0 matirx. I.e. 1 is the more frequent allel, 0 is the less frequent allele. There is an R package available for doing so (GenABEL).
  • Depict the genetic relationship among lines by producing a simple cluster
  • Evaluate based on the cluster whether the marker data you received is sound and whether there is population structure in your data set. If there is population structure apparent you may need to estimate marker effects per population and may need to avoid estimating marker effects across populations (Windhausen et al., 2012).
  • Decide which cross-validation method to use for estimating marker effects. The most common is to separate the data set into 60% training set and 40% validation set. Within the training set marker effects are estimated based on the available genotypic and phenotypic data. The estimated marker effects are used to predict performance in the validation set and the correlation between predicted and observed performance gives you information on the prediction accuracy.
  • Decide which model to use for estimating the marker effects. There are a lot of methods possible; usually they don’t differ a lot regarding the estimated marker effects (Heslot et al., 2012). To my point of view it is easiest to estimate marker effects with ridge regression BLUP. A R package (rrBLUP) is available for doing so (Endelman, 2011).
  • Estimate the marker effects in the panel you are having available and approve them in an independent set. For example you may want to estimate marker effects for GCA using parental lines from SUR. You could approve whether these are useful predicting parental lines of other companies. Attention: You may face problems due to population structure or genotype-x-environment effects! This would imply that the marker effects are only valid in the set you genotyped and/ or only for the environments in which you phenotyped.


Possibilities for implementing marker selection into the breeding program

  • Separate a female and male pool by estimating the genetic distance between the current breeding lines and parental lines used for hybrid production
  • Prediction of the general combining ability of female and male lines. Design crosses using this information and predict testcross performance
  • Implement genomic prediction within  a recurrent selection program  for the male and female pool
  • Predict genotype performance within certain crosses/ DH-populations
  • Evaluate which genotypes from competitors should be integrated into the breeding program

See also (Heffner et al., 2009; Poland et al., 2012; Windhausen et al., 2012)



Elshire, R.J., J.C. Glaubitz, Q. Sun, J. a. Poland, K. Kawamoto, E.S.E.S. Buckler, and S.E. Mitchell. 2011. A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species (L Orban, Ed.). PLoS One 6(5): e19379.

Endelman, J.B. 2011. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250–255.

Heffner, E.L., M.E. Sorrells, and J.L. Jannink. 2009. Genomic Selection for Crop Improvement. Crop Sci. 49: 1–12.

Heslot, N., H.-P. Yang, M.E. Sorrells, and J.-L. Jannink. 2012. Genomic Selection in Plant Breeding: A Comparison of Models. Crop Sci. 52(1): 146.

Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker, J. Crossa, H. Sánchez-Villeda, M. Sorrells, and J.-L. Jannink. 2012. Genomic Selection in Wheat Breeding using Genotyping-by-Sequencing. Plant Genome J. 5(3): 103.

Rutkoski, J., J. Benson, Y. Jia, G. Brown-Guedira, J.-L. Jannink, and M. Sorrells. 2012. Evaluation of Genomic Prediction Methods for Fusarium Head Blight Resistance in Wheat. Plant Genome J. 5(2): 51.

Rutkoski, J.E., J. Poland, J.-L. Jannink, and M.E. Sorrells. 2013. Imputation of unordered markers and the impact on genomic selection accuracy. G3 (Bethesda). 3(3): 427–39.

Williams, E., H.-P. Piepho, and D. Whitaker. 2011. Augmented p-rep designs. Biom. J. 53(1): 19–27.

Windhausen, V.S., G.N. Atlin, J.M. Hickey, J. Crossa, J.-L. Jannink, M.E. Sorrells, B. Raman, J.E. Cairns, A. Tarekegne, K. Semagn, Y. Beyene, P. Grudloyma, F. Technow, C. Riedelsheimer, and A.E. Melchinger. 2012. Effectiveness of Genomic Prediction of Maize Hybrid Performance in Different Breeding Populations and Environments. G3 Genes|Genomes|Genetics 2: 1427–1436.

November 4th, 2013
Topic: Plant breeding Tags: None

≡ Leave a Reply

You must be logged in to post a comment.