As we begin the coding part of the R workshop, let us try to bring things together and make it as easy as possible for you to run the code and work with me as we progress through the topics of the next 2 days. On that note, ror each session of the R workshop, I will have an R script already prepared for you, complete with comments for every step that you will need to download and open in RStudio at the beginning of every workshop section.
Please download the following:
Getting the Data in
As with any program out there today, there are several ways to bring data into the R platform. We will work through 2 different ways to accomplish this task, and once we’ve worked through both, I would like to hear which one you prefer!
Reading a CSV file, gets you in the habit of creating a preservation-ready format for your data, but you’ve probably already figured out, that it also means having documentation at the ready – so you remember what variable is what, and with respect to reading it into R, you need to pick and choose the location, or make sure your working directory has been set at the beginning. Reading an Excel file, is just SOooo much easier and probably the way most of us like to work. Just remember to save your data as you work!
Merging files sounds like such an innocent task. I have an Excel file with 4 monthly worksheets and all I want to do is put them all together into 1 file, so I can analyse the data. Easy peasy right??
There are a few ways of merging files in R. The most straightforward method is to use the merge function available in Base R. Try it out with our data and tell me what happens when you merge the January data which has 25 observations with the February data which only has 23 observations?
So, we’ve noticed that there’s something NOT quite right with this merge. The 2 observations that had a measure in January but not in February were not included in our final dataset. What happens if later on, say in March or April, we do have measures for these individuals? We want them to be included. So we need to consider other methods of merging our files.
We will use the joining functions available in the DPLYR package. By doing this we need to take a quick little detour to remind ourselves about sets, unions, and joins? This is the way that R takes when merging or rather joining datasets. You’ll also see that by taking this approach we can do merge all of our data using this one function, unlike SAS and SPSS.
This is a perfect opportunity to show you the Cheatsheets in R. In RStudio follow these steps:
- Data Transformation with DPLYR
Let’s work through the examples of Combine Tables to get a better understanding of how to merge in R.
Based on these examples, we are interested in performing a FULL_JOIN. Did the coding in the R script work for you? Can you see how this might work for your own research data?
Creating new variables
Creating a new variable is very straightforward function: Ynew = Var1 + Var2 or whatever variable you need to create. The tricky part is ensuring that it becomes a part of your dataset. Let’s work through the examples in the R script.
Now what if we want to recode a variable rather than just creating a new one? For example: we want to create a new variable called wtclass that will take the weights measured in January and put them into 3 weight classes: 1 = 13-16; 2 = 17-20; 3 = 21-24
Getting your data into R, can be as easy as using the READXL package and importing your Excel worksheets directly into R.
Once you have your dataset in R, you can merge files using the join functions available in the DPLYR package.
Creating new variables and recoding variables is straightforward, just remember to make sure that you have added them to your R datafile by using the attach() and detach() functions. Note there are other ways of doing this as well, this is just one.
Don’t be afraid to check out the Help resources – Cheatsheets are fun and very informative.