Analyses of Variance (ANOVA) is probably one of the most used statistical analyses used in our field. In R, there are many different ways to conduct an ANOVA. The key, as is for any analysis, is to know your statistical model, which is based on your experimental design, which in turn is based on your research question and hypothesis. We will work through an RCBD (randomized complete block design) using 2 commonly used ANOVA functions in R, to see the differences and how each function handles a mixed model.
Let’s review the data collected from a small RCBD trial. There were 4 blocks, where 6 treatments were randomly assigned to each. The statistical model for this experimental design is:
Whenever you run an ANOVA or any statistical analysis in R, it is recommended that you save it as an object, that way you can run a summary() on the object and refer to it later for plotting and other future analysis. To conduct the ANOVA, we will use the statistical model provided above and essentially translate it into R.
model <- aov(nitrogen ~ block + trmt, data=rcbd_data)
We are saving the results into an object called model. Inside the aov() you can see our model that matches the statistical model : nitrogen ~ block + trmt. Notice that you do not use an ‘=’ but you use a tilda or ‘~’. Also note, contrary to SAS, you include the + signs. Our model is a basic one where we have the random effect of the block and the fixed effect of the trmt.
Once you run this, you need to run the object model to see what you have as results. Then to see the anova table run summary(model).
Challenges with this analysis
Has anyone noticed how R has treated our random effect of block? Yes, it is treating it as a fixed effect. What about our trmt effect – notice anything odd about this one? Why is there only 1 df when we have 6 treatment levels – shouldn’t we have 5 df and not 1 df???
We need to set the block and trmt variables as factors before we run the ANOVA in R. Remember we used this coding before, we will use 2 new objects for this and call them block_fac and trmt_fac.
block_fac = as.factor(rcbd_data$block)
trmt_fac = as.factor(rcbd_data$trmt)
Now let’s try the ANOVA again. Do these results make more sense?
ASIDE: CALCULATING p-values
As we continue to work on ANOVAs I wanted to point out different ways to calculate your p-value. Ideally the p-value will accompany your analysis, but situations exist when that may not happen. A quick way to calculate the p-value is by using the pf() function in R. Let’s test this out with our example above. We are going to calculate the p-value for our trmt effect.
pf(F-statistic value, numerator df, denominator df) – in our case the numerator df is 5 for the trmt effect and the denominator df is the df associated with the residual, which is 15 in this case. When you run this you should get the same p-value you get when you run the summary() on the model.
Residual analysis following aov()
One of the assumptions of any ANOVA is to ensure that the residuals are normally distributed. Let’s take a look at the R script to try some plots to see what we can do.
Also recall the shapiro.test that we used earlier – try running this on the residuals of our model.
One of the challenges with our previous analysis is that aov() ran the analysis and considered our block effect as a fixed effect. aov() uses ordinary least squares as the methodology for calculating the ANOVA table. In other words, we can easily calculate all the SS, MS, F, by hand if we wanted to. When adding a random effect into our models we need to take a slightly different approach. This is the first change in the methodology of our ANOVAs – moving to a REML or Restricted Maximum Likelihood methodology. This is where the lmer() package comes into play.
Let’s start by reviewing our statistical model above. Recognizing that we need to find a way to let R know that the block effect is a random effect in our model. Let’s create a new object called rmodel with our new model:
rmodel <- lmer(nitrogen ~ (1|block) + trmt, data=rcbd_data)
Notice how we set out the random effect of block in this model – (1|block).
As we’ve done previously, review the results in the object we called rmodel, then try the summary() function on rmodel. Can you see the differences in the results? To see the ANOVA table results and fixed effects we need to use a new function anova() with the rmodel object we created by running the lmer() function for the mixed model.
Now you should see the fixed effect trmt results – but what is missing? How do we obtain the p-value?
1-pf(0.4719,5,15) – which gives us a p-value of 0.79 – indicating that we should accept the Null hypothesis that the Nitrogen is the same for all treatments OR there are no differences between the treatments.
Residual analysis following lmer()
As we did with the first model let’s check our residuals. Since we ran a proper mixed model, we will not have access to the same set of plots we did earlier. Please refer to the accompanying R script for the plot and normality test syntax.
After reviewing both, would you conclude that the residuals are normally distributed and that we did a good job with this model?
Sometimes we may be working with data that is not as normal as we would like it to be, and we want the option to specify the distribution of the incoming data. The NLME package will allow us to do this amongst many more options. We will NOT use the added functionality of the NLME package in this workshop, but I want to work through our RCBD example using this package, so that you are aware of it and can work through your own data when the time comes.
We are using the same data and same statistical model, however, our R script will change due to the new package:
newmodel <- lme(nitrogen ~ trmt_fac, random = ~1|block_fac, data=rcbd_data)
Notice the changes in the syntax when using the NLME package. This is the identical model that we used in LMER – but different syntax. Please note that it is recommended to create the treatment variable as a factor before you include it in the model.
As we did above, we will check the residuals, and we will also use EMMEANS to look means comparisons.
We worked through 1 example using 3 different packages that can run an ANOVA analysis. The first aov() only ran our model as a fixed effects model which was incorrect for our RCBD. The second analysis use the lmer() package – which used our mixed model correctly but left us calculating the p-value for our fixed effect separately. The third analysis, we used NLME package to run our mixed model followed with the means comparison tests. There are other packages available where an ANOVA analysis forms part. However, you should do your homework and seek out any accompanying documentation, ideally a refereed journal discussing its development and use, before using it for your own research.