When you think about conducting any statistical analysis, your starting point is data. R has a slightly different way of working with your data. Being aware of the differnt types of data in R, can help save a little time when you use a new package and it is asking you about your data. So let’s review a few definitions of the different data types observed in R.
Numeric, Character, or Logical
A quick overview of the different types of data you can work with in R.
- Numeric = numbers
- Character = words
- Logical = TRUE or FALSE – not all data is in the form of numbers or letters, sometimes you might have data that has been collected as matching a criteria (TRUE) or not matching a criteria (FALSE). We’ll work through examples of this in another session, for now just be aware that this type of data is commonly used in R.
- How do you find out what form your data are in?
- The results of this statement will tell you exactly what form your data are.
testform <- c(12, 13, 15)
Numeric Classes in R
Numbers are handled in a couple of ways in R. These are referred to as the Numeric Classes of R, and two that we will are known as integer and double. Having a basic understanding of these different numeric classes will come in handy.
- If you think back to high school math, you’ll probably remember the term “integer”. First thing that comes to my mind when I think of integer – is Whole number, no fractions, no decimal places.
- As you can imagine storing numeric data as integers does not require a lot of space. So, in terms of computing, if you do not foresee your analysis needing decimals and precision numbers, then integers are the way to go.
- Double precision floating point numbers – think of this as the decimals side of your numeric data.
- Storing Double numeric data takes up more space than Integer data. But sometimes you’re just not sure what you will need, so R will switch between the 2 numeric classes as it is required for your analysis.
Data Types in R
Let’s review the different data types available to you in R.
- Let’s not panic at some of these terms, but work through examples of each. Think of a vector as a column of data or one variable.
- Vectors can be numeric, characters, or logical format.
- How to create a vector:
# a numeric vector
a = c(2, 4.5, 6, 12)
# a character vector
b = c(“green”, “blue”, “yellow”)
# a logical vector
c = (TRUE, TRUE, FALSE, TRUE)
a = ; b = ; c = ; creating vectors called a, b, c respectively. Please note that a <- is the same as a =
c(x, x, x ) tells R that we are creating a vector or a column with the contents found in the parentheses. The , tells R to drop to the next row in the vector/column being created.
character values must be contained in ” “, but logical values do not.
- Think of a matrix as an object made up of rows and columns.
- The vectors within a matrix must all be the same type, so all numeric, or all character, or all logical.
- How to create a matrix:
# creates a 5 x 4 numeric matrix – 5 rows by 4 columns
y <- matrix(1:20, nrow=5,ncol=4)
y = or y <- create a matrix called y
matrix( ) – call the function matrix to create the matrix y
1:20 – the values of the matrix
nrows = let’s R know how many rows are in the matrix that you are creating
ncol= let’s R know how many columns are in the matrix that you are creating.
Resulting matrix y will look like:
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
- Arrays are very similar to matrices. Think of an array as a matrix with an added dimension. For example, we may have a matrix that contains data for 2015. We want to add in the same data for 2016 in the same format. So we can create an array, with a matrix that contains 2015 data and a matrix that contains a matrix of the 2016 data.
- A Data Frame is a general form of a matrix. What this really means, is that a data frame is like a dataset that we use in other programs such as SAS and SPSS. The columns or variables do not need to be the same type as is required in a matrix.
- We can have one vector/column/variable in a data frame that is integer (numeric), followed by a second one that is character, followed by a third that is logical. But in a matrix, all three vectors/columns/variables must be the same type: numeric, character, or logical.
- How to create a data frame:
d <- c(10, 12, 31, 4)
e <- c(“blue”, “green”, “red”, NA)
f <- c(TRUE, TRUE, TRUE, FALSE)
sampledata <- data.frame(d, e, f)
names(sampledata) <- c(“ID”, “Colour”, “Passed”) # variable names
sampledata <- or sampledata = name of the data frame that we are creating
data.frame( ) calling on the function that creates a data frame
d, e, f tells R that we are creating the data frame with the 3 vectors in the order of d, followed by e, followed by f
names(sac(“ID”, “Colour”, “Passed”) mpledata) – providing variable names within the data frame
c(“ID”, “Colour”, “Passed”) – creating or identifying the 3 variable names within the data frame: ID, Colour, Passed are the variable names
- an ordered collection of objects.
- objects in the list do not have to be the same type.
- You can create a list of objects and store them under one name.
- How to create a list:
# a string, a numeric vector, a matrix, and a scaler
wlist <- list(name=”Fred”, mynumbers=a, mymatrix=y, age=5.3)
wlist <- or wlist = creating a list called wlist
list( ) – calling the function to create a list
name=”Fred”, mynumbers=a, mymatrix=y, age=5.3 values that are to be contained in the list called wlist
Factors are categorical variables in your data. You can have a nominal factor or you can have an ordinal factor. Yup, those words again – remember nominal and ordinal data are categorical pieces of data, so you can fall into one group or another. Nominal, there is no relationship or order to the categories, whereas ordinal data there is an order to the different levels.
Questions or Homework for Self-study work:
- Create examples of a vector, matrix, data frame, and a list.
- Using the following files, identify the type of data :
- Create a data frame with the following information:
- column 1: 13, 14, 15, 12
- column 2: Male, Female, Male, Male
- column 3: TRUE, TRUE, FALSE, FALSE
- column 4: 26, 44, 77, 31
- Can I create a matrix with the information listed in #3 above? Why or Why not?