Just another CropScience weblog

Genotype-by-environment interactions (GxE)

Genotype x environment (GxE) interaction occurs when different genotypes respond differently to different environments. GxE varies with the material tested and the sites chosen for testing (Darbeshwar 2000). Especially complex inherited, quantitative traits are influenced by environmental effects.

As with breeding only the genetic effect can be modified the ratio of the genetic effect within each trait is of importance. The more effect the genotype has the easier the trait is selected. If there is no GxE interaction the genotypes need to only to be evaluated in one environment and whichever genotype is the best in that environment will also be the best in any other environment.

Determination of GxE

a)    analysis of variance

The assumptions underlying the analysis of variance are 1) randomness, 2) normality, 3) additivity and 4) homogeneity of the error variance. The assumption of homogeneity of error variance causes most concern when carrying out analysis of variance of multiple location trials. With the Bartlett test it can be tested if the error mean square of the trials is significantly different.  The test requires at least two replicates at each factor level. If the value of chi-square exceeds the value from statistical tables the error variances show significant heterogeneity, thus the assumption of homogeneity of the error variances is violated.  The Bartlett test is oversensitive to deviations thus the heterogeneity of variances should be considered at the 99.9% level or higher (Brown and Calgari 2008). If the error variances differ significantly the only practical solution is to transform the data (sqrt, log, scale).


bartlett.test(yield~environment), with p<0.05 the error variance is significant different.

More complex models and expected mean squares can be derived for location trials grown over more than one year which produce year x genotype, location x genotype, location x year (Brown and Calgari 2008). The genotype x location x year interaction is called the second order interaction (Darbeshwar 2000).

summary(aov(yield~ g + e + g:e)
summary(aov(yield~ g + l + y + g:e + g:y + g:l + g:l:y)

Calculation of G x E with an analysis of variance

Genetic variance = ∑(mean yield of all genotypes in all environments – mean yield of genotype x in all environments)²

Phenotype = x ̅ + g + e + g x e

GxE=P_{ij}-\mu -g_{i}-t_{j}
E/ G A B C G dG dG²
1 40 30 50 40 3 9
2 50 30 43 41 4 16
30 20 40 30 -7 49
E 40 26.7 44.3 37 VG= 74
dE 3 -10.3 7.3
dE² 9 106.09 53.29 VO= 168


#P = x + dG + dE +dGE #GxE = P – (x + dG + dE) #GxE ²
A B C A B C A B C sum(A-C)
43 29.7 47.3 -3 0.3 2.7 9 0.09 7.29 16.38
44 30.7 48.3 6 -0.7 -5.3 36 0.49 28.09 64.58
33 19.7 37.3 -3 0.3 2.7 9 0.09 7.29 16.38
VGE= 97.34
if GE is negative the genotype was less adapted than expected genotype 2 was the most instable , genotype 1 and 3 show the highest yield stability over environments

Within the assessment of GxE one has also carefully to consider if the environments tested are appropriate. The selection should preferably be carried out under the growing conditions characteristic for the target environment.

The use of standard ANOVA is subject to restrictions generally applicable to linear models such as homogeneity of error variances and additive effects. Although not a very sensitive test significant MSGxE indicates the presence of GxE. This is because of the reason that the large number of degrees of freedom for GxE interaction makes the interaction mean square not significant in the F-test even when the interaction sum of squares is large. Thus, even if the interaction mean square is not significant it is recommended to analyze the genotypic stability (Darbeshwar 2000).

b) genotype correlations over environments

An easy and effective way of assessing the extent of GEI across environments within a target region is to examine the correlations of genotype means from variety trials conducted across sites or groups of sites. If these correlations are above 0.3 for a single 3-replicate trial, GEI is unlikely to be large. If they are lower than 0.3, there are two possible interpretations; either GEI is important, or the trials have high error terms and there is little genetic variation among cultivars under evaluation. Examining the H estimate for cultivars within a single trial can help differentiate these two alternatives; if H for yield is very low in a given trial, means from that trial will usually not be significantly correlated with other trials.

Approaches for coping with GxE

In general first the means of genotypes per location should be calculated. Those means can then be used in a different model to cope with GxE (two stage analysis, Piepho). In the case of G x E it is usually assumed that both effects are random. Environmental differences are often greater over years than over locations and it can be informative to separate year and location effects in the analysis of variance (Brown and Calgari 2008).

a) ignore it

Calculate the adjusted mean of the genotypes over environments and select the best ones. These genotypes are superior across the target population of environments but might not be the best ones for a specific environment. Thus, the genotypes should be selected for a low variance and a good mean expression.


The variance and standard error of each variety is calculated in all environments. The variety with the lowest variance is the most stable. On the other hand we need a good mean expression, so first select for a low variance and then for a good mean expression.

d<-data.frame(order(-rank(a),rank(b))) # sort for low variance with -
e<- data.frame(x$Entry,d)

b) reduce it

The target environments can be partitioned into smaller, more homogeneous subgroups. Cultivar recommendations are then made separately for each subgroup of environments. Cluster analysis and principal component analysis are useful for partitioning into homogenous subgroups. In general it should be tested if the environments or subgroups are significant different from each other before a separation is applied.

Cluster analysis

While using parametric techniques for measuring stability it is assumed that a quantitative character shows a normal distribution, which may not be true. Parametric measures are relatively more sensitive to errors of measurement and addition or deletion of one or few observations causes great variation in parametric stability measures. Thus it is worthwhile to use non-parametric measures for stability. Clustering of genotypes according to their response structure is a non-parametric approach (Darbeshwar 2000). Create hierarchical groups of environments. Data from performance tests are highly unbalanced because genotypes typically vary from location to location and from year to year. The statistical distance between two locations can be determined from the performance of the subset of genotypes that are grown in both environments. The distance measure is calculated after Ouyang et al. (1995).

D_{ij'}=\frac{1}{n}\sum_{i=1}^{n}\left ( \frac{P_{ij}-\mu _{j}}{s_{j}} -\frac{P_{ij'}-\mu _{j'}}{s_{j'}}\right ), j and j’ are 2 environments,  is the mean of all (n) genotypes i, s is the phenotypic standard deviation among all genotypes. When all genotypes in j are also grown in j’, Dij’ can be rewritten as:

D_{ij'}=2 ( 1-\frac{1}{n})(1-r_{jj'}) where r is the correlation between the performance of the genotypes in environment j and j’. Djj’ is 0 if there if the genotype performance is identical in j and j’. Djj’ is 2 if the correlation between genotype performances in each environment is 0. The maximum of Djj’ of 4 is reached if crossover interactions occur and rjj’ is -1 (Bernardo 2002, p. 157).

Ouyang<- dist(x, method = “Ouyang”, diag = FALSE, upper = FALSE, p = 2)

Several methods are available for joining clusters on the basis of Djj’. Within the unweighted pair-group method (UPGMA) the distance between two clusters is equal to the average distance between an environment in the first cluster and an environment in the second cluster (Bernardo 2002).

Pattern analysis is most useful when the TPE (target population of environments) is very large and diverse, and when researchers do not have a good hypothesis about the causes of GEI. But this is rarely the case- breeders usually have a good hypothesis about the way sites should be grouped on the basis of their characteristics.

Test of fixed environment effects

Additionally sites with similar rainfall patterns, soil types, and depths of standing water accumulation may be grouped for breeding purposes. Usually breeders have a working hypothesis about the most important cause of GEI within the region they serve. For example, the TPE may include both commercial farms where high levels of N are applied and subsistence farms using low levels of N. It is possible to test the hypothesis that there is cultivar x N level interaction in a multi-location cultivar trial by including N level as a fixed factor in a combined analysis of variance. If the G x E (E as the subdivision of environments) is not significant the division of the TPE into countries would not be warranted. In general, we want to subdivide a target region only when we can show that there is real cultivar x region interaction, and that subdivision will increase H.

Principle component analysis

Williams (1952) showed that the least square estimation of regression coefficients in the linear regression approach of GxE interaction was equivalent to extracting the first principal component of the genotypic performance (Darbeshwar 2000). G x E is calculated for n entries in k environments. PCA transforms the data into linear combinations of the original variables. These principle components are uncorrelated with each other. The first principle component accounts for the largest percentage of the variation in the data. If the first two components don’t explain a high percentage of variation, PCA losses much of its usefulness in partitioning environments into homogeneous subgroups.


Cluster analysis and PCA can be used as tool for identifying regions and subregions which are closely related.

c) exploit it

Identify cultivars suited to specific environments.  Stability analysis provides information on the performance of genotypes as a linear function of the level of productivity in each environment. Multiplicative models aim to identify cultivars best suited to specific environments or to identify subsets of cultivars that do not exhibit crossover interaction.

Stability analysis

Stability analysis aims to examine the reaction of a genotype, relative to other genotypes, to different environments. The concept of stability implies that some measure that distinguishes one environment from another is needed. This environmental index should be based on environmental factors such as soil properties, climatic factors or biotic and abiotic stresses that affect the performance of genotypes. If the index is not yet developed the effect of the jth environment can serve as useful environmental index (mean of all genotypes in environment j minus the overall mean).

Yates and Cochran (1938) stated that “the degree of association between varietal differences can be further investigated by calculating the regression of the yields of the separate varieties on the mean of all yields” (in Brown and Calgari 2008 p. 139). The regression coefficient of the genotypes can be used to determine the stability of genotypes over environments. Plant breeders want to develop varieties with high average performance and low b values as these genotypes would produce high yields under all environments.

P_{ij}=\mu +g_{i}+(1+b_{i})t_{j}+\delta _{ij}+\varepsilon _{ij}

is the mean of the genotype i across all environments

bi is the regression coefficient of the linear regression

b=0 indicates the performance of a genotype is constant over environments (Type I stability)
b=1 indicates that the response to different environments is the same as the average response of all genotypes in the experiment (Type II stability)
b>1 indicates that a genotype has a better than average response to favorable environments, but worse than average response to unfavorable environments
b<1 indicates genotypes with low sensitivity to environmental changes and therefore higher adaptability to low-yielding environments. (characterization after Finlay and Wilkinson 1963)

tj is the environmental index (mean of the performance of all entries in each location minus overall mean)
dij is the deviation of Pij from the regression fitted value of genotype i  in environment j
Eij is the within environment error, averaged across replications

With assessing the b genotype performance can be predicted.

\hat{P}_{ij}=\mu +g_{i}+b_{i}t_{j}

The analysis of variance can be used to partition the G x E into heterogeneity of regression. This implicates, that the regression slopes of the different genotypes have different slopes. The heterogeneity can be further compared with the deviations from regression to see if it accounts for a significant part of the observed interaction (joint regression analysis).  The deviations from regression implicate that the relationship between genotypes and environments is not explicable by linear regression.

The joint-regression analysis was proposed by Finlay and Wilkinson (1963, in Bos and Calgari 2008 p. 330 and Bernardo 2002 p. 161). The Finlay-Wilkinson model implies regression on a latent environmental variable. The model allows heterogeneity both in variances and covariances.

In summary, join regression analysis can be used to obtain estimates of mean performance and a measure of relative responsiveness to environmental change (b, Lynch and Walsh 1998). Breeders disagree on which type of stability is most desirable. The ideal situation is to have Type I stability along with the highest mean in any environment. Genotypes that exhibit Type I stability tend to have a low mean performance. A breeder must therefore consider both b value and the mean performance of a cultivar across all environments (Bernardo 2002).

Major criticism of joint regression analysis is that the genotype performance is a factor in determining the site mean (Brown and Calgari 2008). While using parametric techniques for measuring stability it is assumed that a quantitative character shows a normal distribution, which may not be true. Parametric measures are relatively more sensitive to errors of measurement and addition or deletion of one or few observations causes great variation in parametric stability measures (Darbeshwar 2000).


Bernardo R. 2002: “Breeding for quantitative traits in plants”. 369 p., chapter 7. Stemma Press, Woodbury.

Bos I.  and Caligari P. 2008: “Selection Methods in Plant Breeding”. 461 p. Springer, Dordrecht.

Brown J. and Caligari P. 2008: “An Introduction to Plant Breeding. Blackwell Publishing, Oxford, Ames, Carlton, 209 p.

Darbeshwar R. 2000: “Plant Breeding. Analysis and Exploitation of Variation”. Narosa Publishing House, 701 p.

Gauch and Zobel 1996: „AMMI Analysis of yield trials“. In: “Genotype –by- Environment Interaction”. Ed: Kang M. S. and Gauch H. G., CRC Press, Boca Raton, New York, Tokyo, p. 85-115.

Lynch M. and Walsh B. 1998: “Genetics and Analysis of Quantitative traits”. Sinauer Associates, In. Publishers, Sunderland, 980 p., chapter 22.

June 24th, 2012
Topic: Crop Science, Plant breeding Tags: None

≡ Leave a Reply

You must be logged in to post a comment.