R has several data types, including:
A vector is a simple data structure, where data is stored in one column.
The simplest way to define a numeric vector is with the c()
statement:
X <- c(1,2,3,5,6) # numeric vector X
## [1] 1 2 3 5 6
R has a colon notation to create series of numbers:
X <- c(1:6) # numeric vector X
## [1] 1 2 3 4 5 6
seq()
An explicit function is seq(from=X, to=Y, by=Z)
:
X <- seq(from = 1, to = 3, by = 0.25) # numeric vector X
## [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
seq()
can be used with the parameter length
:
X <- seq(from = 1, to = 3, length = 9) # numeric vector X
## [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
rep()
You can create vectors containing repetitions with the function rep()
:
X <- rep(1:2, times = 3) # numeric vector X
## [1] 1 2 1 2 1 2
The function rep()
may have also the argument each
:
X <- rep(1:3, each = 3) # numeric vector X
## [1] 1 1 1 2 2 2 3 3 3
Both arguments times
and each
can be used together:
X <- rep(1:3, each = 2 , times = 2) # numeric vector X
## [1] 1 1 2 2 3 3 1 1 2 2 3 3
A vector may contain text:
X <- c("A","B","C") # character vector X
## [1] "A" "B" "C"
A vector may contain logical values:
X <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector X
## [1] TRUE TRUE TRUE FALSE TRUE FALSE
A vector contain data of the same type. In the following examples, R interpret all the data as characters:
X <- c(1, "A", TRUE) #R will interpret all these values as text X
## [1] "1" "A" "TRUE"
You may have access to a n-element of a vector by its index X[n]
:
X <- c(1:6) X[1]
## [1] 1
You may select multiple elements of a vector by specifying multiple indices, like X[x, y, z]
:
X <- c(1:6) X[c(2, 4)]
## [1] 2 4
Elements can be also excluded by a negative index, like X[-n]
:
X <- c(1:6) X[-c(1:3)]
## [1] 4 5 6
The indices can be used also to substitute the value of an elements:
X[2] <- 20 X
## [1] 1 20 3 4 5 6
Logical indices can be used to search specific elements of a vector:
X <- c(1, 3, 7, 4, 9, 2) # define the vector X X[X > 4] # select only those elements with values higher than...
## [1] 7 9
Matrices are a collection of vectors, all of the same type. The elements are arranged in a two-dimensional rectangular layout.
cbind()
A simple way to define matrices is with
the cbind()
function, which bind a series of vectors column-wise:
x <- c(1,2,3,4,5) # numeric vector y <- c(1,2,3,4,5) # numeric vector z <- c(1,2,3,4,5) # numeric vector M <- cbind(x,y,z) M
## x y z ## [1,] 1 1 1 ## [2,] 2 2 2 ## [3,] 3 3 3 ## [4,] 4 4 4 ## [5,] 5 5 5
Similar to cbind()
function, there is also the rbind()
function,
which binds vectors by one row at a time:
M <- rbind(x, y, z) M
## [,1] [,2] [,3] [,4] [,5] ## x 1 2 3 4 5 ## y 1 2 3 4 5 ## z 1 2 3 4 5
matrix()
Matrices can be also defined by the matrix()
function:
A = matrix( c(1:6), # the data elements to be filled in nrow=2, # the number of rows, ncol=3, # the number of columns, and byrow = TRUE) # filling the matrix rowwise (one row at a time). A
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6
An element of a n x m
matrix can be selected with its
index M[n, m]
, similarly to vectors. Consider the matrix:
M <- matrix(round(runif(12, 5, 10), 0), # generate random integers nrow = 3, # and fill into a matrix of 3 rows ncol = 4) # and 4 columns. M
## [,1] [,2] [,3] [,4] ## [1,] 9 9 6 10 ## [2,] 6 7 9 6 ## [3,] 7 9 6 8
To get the value of the first column:
M[, 1]
## [1] 9 6 7
To get the value of the second row:
M[2, ]
## [1] 6 7 9 6
To get the values of the first and third column:
M[, c(1, 3)]
## [,1] [,2] ## [1,] 9 6 ## [2,] 6 9 ## [3,] 7 6
To get the value of a specific element:
M[2, 3]
## [1] 9
To get the elements that satisfy a condition:
M[M > 7] # this returns a vector
## [1] 9 9 9 9 10 8
To get rid of a row or column, you can use a negative index:
M[, -2]
## [,1] [,2] [,3] ## [1,] 9 6 10 ## [2,] 6 9 6 ## [3,] 7 6 8
dim()
Given a matrix:
M <- matrix(rep(c(1:4), times = 3), 3, 4) M
## [,1] [,2] [,3] [,4] ## [1,] 1 4 3 2 ## [2,] 2 1 4 3 ## [3,] 3 2 1 4
The dimension of a matrix is given by the function dim()
:
dim(M) # return the number of rows and columns
## [1] 3 4
The function dim()
can also be used to change dimensions:
dim(M) <- c(4, 3) M
## [,1] [,2] [,3] ## [1,] 1 1 1 ## [2,] 2 2 2 ## [3,] 3 3 3 ## [4,] 4 4 4
The same synthax is used to transform a matrix to a vector:
dim(M) <- c(12, 1) M
## [,1] ## [1,] 1 ## [2,] 2 ## [3,] 3 ## [4,] 4 ## [5,] 1 ## [6,] 2 ## [7,] 3 ## [8,] 4 ## [9,] 1 ## [10,] 2 ## [11,] 3 ## [12,] 4
dimnames()
:Sometime, it is more simple to refer to names instead of numerical indices. For this you
can define the names of rows and columns by the fnuction dimnames()
:
M <- matrix(rep(c(1:3), times = 4), 3, 4) dimnames(M) = list( c("row1", "row2", "row3"), # row names c("col1", "col2", "col3", "col4")) # column names M
## col1 col2 col3 col4 ## row1 1 1 1 1 ## row2 2 2 2 2 ## row3 3 3 3 3
A list
is the most flexible container of objects in R. Its elements can be
unrelated, of any type and size.
mylist <- list(name=c("A", "B", "C"), numb=c(1,2,3,4), matr=cbind(c(2,1),c(1,2)), vect=c(5,3,4,5,6,2)) mylist
## $name ## [1] "A" "B" "C" ## ## $numb ## [1] 1 2 3 4 ## ## $matr ## [,1] [,2] ## [1,] 2 1 ## [2,] 1 2 ## ## $vect ## [1] 5 3 4 5 6 2
An object contained in the list can be accessed with the dollar
notation:
mylist$name
## [1] "A" "B" "C"
Dataframes are between a list and a matrix. It is like a list since the columns can contain different types of objects (i.e. texts, numbers, factors). It is like a matrix since the output is a table. To create a dataframe from scratch:
origin <- c("ITA", "AUT", "FRA") protein <- c(2, 3, 2) sugar <- c(8, 12, 10) mydata <- data.frame(origin,protein,sugar) mydata
## origin protein sugar ## 1 ITA 2 8 ## 2 AUT 3 12 ## 3 FRA 2 10
To edit manually the data, use the command "edit(mydata)"
With the dataframe "mydata", the variable "origin" should be used as categorical factor:
mydata[,1] <- factor(mydata[,1])
This statement stores this vector as (1, 2, 1, 1) and associates it with 1=Type1 and 2=Type2 internally.
str(mydata) # structure of an object
## 'data.frame': 3 obs. of 3 variables: ## $ origin : Factor w/ 3 levels "AUT","FRA","ITA": 3 1 2 ## $ protein: num 2 3 2 ## $ sugar : num 8 12 10
class(mydata) # class or type of an object
## [1] "data.frame"
names(mydata) # names
## [1] "origin" "protein" "sugar"