Many statistical procedures test specific hypotheses. Principal Component Analysis (PCA), Factor analysis, Cluster Analysis, are examples of analyses that explore the data rather than answer a specific hypothesis. PCA examines common components among data by fitting a correlation pattern among the variables. Often used to reduce data from several variables to 2-3 components.
When running a PCA, you need to consider a couple of questions: How many factors/components should be used, and how do you interpret the factors/components?
Before running a PCA, one of the first things you will need to do is to determine whether there is any relationship among the variables you want to include in a PCA. If the variables are not related then there’s no reason to run a PCA. The data that we will be working with is a sample dataset that contains the 1988 Olympic decathlon results for 33 athletes. The variables are as follows:
run100m: time it took to run 100m
longjump: distance attained in the Long Jump event
shotput: distance reached with ShotPut
highjump: height reached in the High Jump event
run400m: time it took to run 400m
hurdles110m: time it took to run 110m of hurdles
discus: distance reached with Discus
polevault: height reached in the Pole Vault event
javelin: distance reached with the Javelin
run1500m: time it took to run 1500m
score: overall score for decathlon
Download the data in an excel spreadsheet here.
For this workshop we will conduct the same analysis in the 3 commonly used statistical packages: SPSS, SAS, and R. We will stat with SPSS then progress to SAS and finally to R.
If you are using SAS, please download the SAS program.
If you are using R, please download the R Script file.