# R vs SAS Series: Getting the data ready – ANOVA

Continuing on from our last blog post R vs SAS Series: Statistical Models Review – ANOVA, let’s take a look at how we need to get the data ready for our analysis.

Let’s review our statistical model.

Nitrateij = μ + trmti + eij

Where:

Nitrateij     = Stem nitrate amount of the jth observation in the ith trmt
μ                 = Overall mean or model intercept
trmti          = the effect of the ith treatment group
eij                  = random error or experimental error

This means that in order to run our analysis, we need to have stem nitrate measures and information about our treatments.  Specifically, we need to have in our dataset a column with the nitrate measures and a second column that tells us which treatment each nitrate measure was on.  You may also have a column that is an identifier – in this case Plot_ID which helps me to identify which plot the measurements were taken from.  A sample data table or Excel file may look like this:

 Plot_ID Treatment Nitrate 101 1 34.98 102 2 40.89 103 3 42.07 … … … 124 6 43.29

## Fixed vs Random Effects

Now we need to do a little bit of background work.  We’ve all heard of FIXED and RANDOM effects.  These should be driven by your statistical model!  In the example we are currently working with, we only have one effect:  Treatment.  Is it a FIXED or is it a RANDOM effect?

Let’s go back and look at some definitions and examples of these 2 terms.

### Fixed Effects

Fixed effects are something you want to study – you set out the levels that you are interested in. You “fix” the levels. The results from your experiment can only talk about the levels you studied.

• Example #1: I want to see whether 1st year students prefer Coke or Pepsi
• Example #2: I want to see the effect of 3 levels of fertilizer on my crop

### Random Effects

Random effects are factors in your design that may contribute variation in your outcome measure, but you are not interested in it. You only want to account for it, before looking at your treatment effects.

• Example #1: I want to study the effect of fertilizer on my crop
• Example #2: Block effect, Weather, etc…

Back to our example – what do you think our Treatment effect is?  If you said FIXED – you are correct!

Alrighty – so Treatment is a FIXED effect.  In our dataset, we entered the Treatment levels as 1, 2, 3, 4, 5, or 6 – in other words, we used numbers.  We could have used letters / alphanumeric / strings – doesn’t matter.  However, using numbers we need to let our programs know that these values are not numbers that we will calculate means or manipulate in any way.  They are to be used as a grouping or classification or as a factor variable.  Something that tells us and the program which treatment each of our nitrate values comes from.

In SAS – we can do this very simply by including the Treatment variable in a CLASS statement.  However, in R, we need to change the format of the variable to a factor.  TO do this we need to use the following R script:

Treatment <- as.factor(Treatment)

We’ll see how this fits in with our ANOVA coding in the next Blog post.  For know – remember:

1. We need to determine which of our factors are FIXED or RANDOM
2. In R, we need to change the format of our factors using the as.factor() function.

## Quick Recap

Everything is based on that statistical model – please remember what it is for your trial

Factors in our model may be FIXED or RANDOM

In SAS we can tell the program which variables are factors by listing them in a CLASS statement.

In R, we need to use the as.factor() function to change the format of our factor variables to a factor

## Coming up next in this mini series

1. R vs. SAS Series: Conducting the ANOVA
2. R vs. SAS Series: Reading the ANOVA outputs
3. R vs. SAS Series: RCBD – ANOVA
4. R vs. SAS Series: RCBD – Reading the ANOVA outputs