R Data types

# Data types

R has several data types, including:

• Vectors
• Matrices
• Dataframes
• Lists

## Vectors

A vector is a simple data structure, where data is stored in one column. The simplest way to define a numeric vector is with the c() statement:

  X <- c(1,2,3,5,6) # numeric vector
X

##  1 2 3 5 6


### Colon notation

R has a colon notation to create series of numbers:

  X <- c(1:6) # numeric vector
X

##  1 2 3 4 5 6


### Function seq()

An explicit function is seq(from=X, to=Y, by=Z):

  X <- seq(from = 1, to = 3, by = 0.25) # numeric vector
X

##  1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00


seq() can be used with the parameter length:

  X <- seq(from = 1, to = 3, length = 9) # numeric vector
X

##  1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00


### Function rep()

You can create vectors containing repetitions with the function rep():

  X <- rep(1:2, times = 3) # numeric vector
X

##  1 2 1 2 1 2


The function rep() may have also the argument each:

  X <- rep(1:3, each = 3) # numeric vector
X

##  1 1 1 2 2 2 3 3 3


Both arguments times and each can be used together:

  X <- rep(1:3, each = 2 , times = 2) # numeric vector
X

##   1 1 2 2 3 3 1 1 2 2 3 3


### Type of vectors

A vector may contain text:

  X <- c("A","B","C") # character vector
X

##  "A" "B" "C"


A vector may contain logical values:

  X <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
X

##   TRUE  TRUE  TRUE FALSE  TRUE FALSE


A vector contain data of the same type. In the following examples, R interpret all the data as characters:

  X <- c(1, "A", TRUE) #R will interpret all these values as text
X

##  "1"    "A"    "TRUE"


### Accessing elements of a vector

You may have access to a n-element of a vector by its index X[n]:

  X <- c(1:6)
X

##  1


You may select multiple elements of a vector by specifying multiple indices, like X[x, y, z]:

  X <- c(1:6)
X[c(2, 4)]

##  2 4


### Removing elements of a vector

Elements can be also excluded by a negative index, like X[-n]:

  X <- c(1:6)
X[-c(1:3)]

##  4 5 6


The indices can be used also to substitute the value of an elements:

  X <- 20
X

##   1 20  3  4  5  6


### Searching elements of a vector

Logical indices can be used to search specific elements of a vector:

  X <- c(1, 3, 7, 4, 9, 2)  # define the vector X
X[X > 4]                 # select only those elements with values higher than...

##  7 9


## Matrices

Matrices are a collection of vectors, all of the same type. The elements are arranged in a two-dimensional rectangular layout.

### Function cbind()

A simple way to define matrices is with the cbind() function, which bind a series of vectors column-wise:

  x <- c(1,2,3,4,5) # numeric vector
y <- c(1,2,3,4,5) # numeric vector
z <- c(1,2,3,4,5) # numeric vector
M <- cbind(x,y,z)
M

##      x y z
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 3 3 3
## [4,] 4 4 4
## [5,] 5 5 5


Similar to cbind() function, there is also the rbind() function, which binds vectors by one row at a time:

  M <- rbind(x, y, z)
M

##   [,1] [,2] [,3] [,4] [,5]
## x    1    2    3    4    5
## y    1    2    3    4    5
## z    1    2    3    4    5


### Function matrix()

Matrices can be also defined by the matrix() function:

    A = matrix(
c(1:6),              # the data elements to be filled in
nrow=2,              # the number of rows,
ncol=3,              # the number of columns, and
byrow = TRUE)        # filling the matrix rowwise (one row at a time).
A

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6


### Select elements of a matrix

An element of a n x m matrix can be selected with its index M[n, m], similarly to vectors. Consider the matrix:

  M <- matrix(round(runif(12, 5, 10), 0), # generate random integers
nrow = 3,             # and fill into a matrix of 3 rows
ncol = 4)             # and 4 columns.
M

##      [,1] [,2] [,3] [,4]
## [1,]    9    9    6   10
## [2,]    6    7    9    6
## [3,]    7    9    6    8


To get the value of the first column:

  M[, 1]

##  9 6 7


To get the value of the second row:

  M[2, ]

##  6 7 9 6


To get the values of the first and third column:

  M[, c(1, 3)]

##      [,1] [,2]
## [1,]    9    6
## [2,]    6    9
## [3,]    7    6


To get the value of a specific element:

  M[2, 3]

##  9


To get the elements that satisfy a condition:

  M[M > 7]    # this returns a vector

##   9  9  9  9 10  8


To get rid of a row or column, you can use a negative index:

  M[, -2]

##      [,1] [,2] [,3]
## [1,]    9    6   10
## [2,]    6    9    6
## [3,]    7    6    8


### Function dim()

Given a matrix:

  M <- matrix(rep(c(1:4), times = 3), 3, 4)
M

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    3    2
## [2,]    2    1    4    3
## [3,]    3    2    1    4


The dimension of a matrix is given by the function dim():

  dim(M)    # return the number of rows and columns

##  3 4


The function dim() can also be used to change dimensions:

  dim(M) <- c(4, 3)
M

##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3
## [4,]    4    4    4


The same synthax is used to transform a matrix to a vector:

  dim(M) <- c(12, 1)
M

##       [,1]
##  [1,]    1
##  [2,]    2
##  [3,]    3
##  [4,]    4
##  [5,]    1
##  [6,]    2
##  [7,]    3
##  [8,]    4
##  [9,]    1
## [10,]    2
## [11,]    3
## [12,]    4


### Function dimnames():

Sometime, it is more simple to refer to names instead of numerical indices. For this you can define the names of rows and columns by the fnuction dimnames():

  M <- matrix(rep(c(1:3), times = 4), 3, 4)
dimnames(M) = list(
c("row1", "row2", "row3"),         # row names
c("col1", "col2", "col3", "col4")) # column names
M

##      col1 col2 col3 col4
## row1    1    1    1    1
## row2    2    2    2    2
## row3    3    3    3    3


## Lists

A list is the most flexible container of objects in R. Its elements can be unrelated, of any type and size.

    mylist <- list(name=c("A", "B", "C"),
numb=c(1,2,3,4),
matr=cbind(c(2,1),c(1,2)),
vect=c(5,3,4,5,6,2))
mylist

## $name ##  "A" "B" "C" ## ##$numb
##  1 2 3 4
##
## $matr ## [,1] [,2] ## [1,] 2 1 ## [2,] 1 2 ## ##$vect
##  5 3 4 5 6 2


An object contained in the list can be accessed with the dollar notation:

    mylist$name  ##  "A" "B" "C"  ### Dataframes Dataframes are between a list and a matrix. It is like a list since the columns can contain different types of objects (i.e. texts, numbers, factors). It is like a matrix since the output is a table. To create a dataframe from scratch:  origin <- c("ITA", "AUT", "FRA") protein <- c(2, 3, 2) sugar <- c(8, 12, 10) mydata <- data.frame(origin,protein,sugar) mydata  ## origin protein sugar ## 1 ITA 2 8 ## 2 AUT 3 12 ## 3 FRA 2 10  To edit manually the data, use the command "edit(mydata)" ### Factors With the dataframe "mydata", the variable "origin" should be used as categorical factor:  mydata[,1] <- factor(mydata[,1])  This statement stores this vector as (1, 2, 1, 1) and associates it with 1=Type1 and 2=Type2 internally. ### Built in functions to check data structures  str(mydata) # structure of an object  ## 'data.frame': 3 obs. of 3 variables: ##$ origin : Factor w/ 3 levels "AUT","FRA","ITA": 3 1 2
##  $protein: num 2 3 2 ##$ sugar  : num  8 12 10

    class(mydata)  # class or type of an object

##  "data.frame"

    names(mydata)  # names

##  "origin"  "protein" "sugar"