A question that comes up more and more in my position. Graduate students starting their academic career or experienced researchers looking to keep up with the “trends”.
There was a recent article published on the RBloggers website, that compared the top statistical packages: R, Python (?), SAS, SPSS, and Stata. If you are interested in reading the original article I’ve linked to it here. I’d like to summarize and show a few examples as well.
What do they look like?
R Studio is one of the more common ways that folks are using R today. It is a comfortable environment – a little bit of GUI that really doesn’t leave you hanging out in space – ok maybe a little – but you’re fine once you get comfortable with the coding.
Yes! you read that correctly – you need to write coding in R – very similar to needing to write code in SAS. The code or syntax is different for the 2 programs – but you need to write some code in order to conduct any statistical analyses in either program.
SAS as you may be aware has a few different interfaces as well. There is the SAS Studio – used with the Free University edition
Licensed version of SAS:
As I noted earlier each program has their own language or syntax. R is comprised of packages that may deal with a type of analysis. Within a package there are several functions. SAS we have PROCedures with options and lines of code that will run the analysis. Very similar concepts. Each program will have documentation. Since R is open source and community driven, the detail of the documentation will depend on the creator of the package. SAS documentation is extensive but very technical at times.
ggplot(fruit, aes(x=Yield)) +
plot(Yield ~ Variety,
col = factor(Variety),
legend = c(1, 2, 3, 4),
col = c(“black”, “red”, “green”, “blue”),
Proc sgplot data=out_asp2010_test;
scatter x=julian y=mms / group=entry yerrorlower= low4 yerrorupper = high4;
series x=julian y=mms / group=entry lineattrs=(pattern=solid);
xaxis label =”Julian Day”;
yaxis label = “Mms”;
title “Plot of Mms by Julian Day for 2010”;
As noted above R is open source and community-driven. Which also means that it is supported by the community. Any questions, challenges you may encounter, you will use a variety of sources to find help: the author of the package you are using, or a listserv.
SAS is a commercial product with professional support network to assist its users. There are listservs of users as well.
As pointed out in the R Bloggers article, they both have their strengths and their weaknesses. I’ll be honest I never through I’d see the day when banks and pharma started using R, but it’s here! The small program that folks used because it was free and accessible, has now become a major contender in the statistical analysis world.
Which program you select to use, will depend on your background – what have you used in your undergrad or in your course – the level of support available to you on your campus, maybe what program your supervisor uses or recommends. I used to recommend SAS if you were going to work in a workplace that needed standards, but after learning more about R and seeing its growth, I’m not sure that should be a reason to use SAS in academia anymore.
I, personally, believe, that we should be learning both programs – I know too much time to learn – but they both look awesome on a resume, and they both provide you with the opportunities to increase your skillset and talk stats to SAS and R users 😉