Data Analysis
visitors: 39468 - online: 1 - today: 19

R Data types

Data types

R has several data types, including:

Vectors

A vector is a simple data structure, where data is stored in one column. The simplest way to define a numeric vector is with the c() statement:

  X <- c(1,2,3,5,6) # numeric vector
  X
## [1] 1 2 3 5 6

Colon notation

R has a colon notation to create series of numbers:

  X <- c(1:6) # numeric vector
  X
## [1] 1 2 3 4 5 6

Function seq()

An explicit function is seq(from=X, to=Y, by=Z):

  X <- seq(from = 1, to = 3, by = 0.25) # numeric vector
  X
## [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00

seq() can be used with the parameter length:

  X <- seq(from = 1, to = 3, length = 9) # numeric vector
  X
## [1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00

Function rep()

You can create vectors containing repetitions with the function rep():

  X <- rep(1:2, times = 3) # numeric vector
  X
## [1] 1 2 1 2 1 2

The function rep() may have also the argument each:

  X <- rep(1:3, each = 3) # numeric vector
  X
## [1] 1 1 1 2 2 2 3 3 3

Both arguments times and each can be used together:

  X <- rep(1:3, each = 2 , times = 2) # numeric vector
  X
##  [1] 1 1 2 2 3 3 1 1 2 2 3 3

Type of vectors

A vector may contain text:

  X <- c("A","B","C") # character vector
  X
## [1] "A" "B" "C"

A vector may contain logical values:

  X <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
  X
## [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

A vector contain data of the same type. In the following examples, R interpret all the data as characters:

  X <- c(1, "A", TRUE) #R will interpret all these values as text
  X
## [1] "1"    "A"    "TRUE"

Accessing elements of a vector

You may have access to a n-element of a vector by its index X[n]:

  X <- c(1:6)
  X[1]
## [1] 1

You may select multiple elements of a vector by specifying multiple indices, like X[x, y, z]:

  X <- c(1:6)
  X[c(2, 4)]
## [1] 2 4

Removing elements of a vector

Elements can be also excluded by a negative index, like X[-n]:

  X <- c(1:6)
  X[-c(1:3)]
## [1] 4 5 6

The indices can be used also to substitute the value of an elements:

  X[2] <- 20
  X
## [1]  1 20  3  4  5  6

Searching elements of a vector

Logical indices can be used to search specific elements of a vector:

  X <- c(1, 3, 7, 4, 9, 2)  # define the vector X
  X[X > 4]                 # select only those elements with values higher than...
## [1] 7 9

Matrices

Matrices are a collection of vectors, all of the same type. The elements are arranged in a two-dimensional rectangular layout.

Function cbind()

A simple way to define matrices is with the cbind() function, which bind a series of vectors column-wise:

  x <- c(1,2,3,4,5) # numeric vector
  y <- c(1,2,3,4,5) # numeric vector
  z <- c(1,2,3,4,5) # numeric vector
  M <- cbind(x,y,z)
  M
##      x y z
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] 3 3 3
## [4,] 4 4 4
## [5,] 5 5 5

Similar to cbind() function, there is also the rbind() function, which binds vectors by one row at a time:

  M <- rbind(x, y, z)
  M
##   [,1] [,2] [,3] [,4] [,5]
## x    1    2    3    4    5
## y    1    2    3    4    5
## z    1    2    3    4    5

Function matrix()

Matrices can be also defined by the matrix() function:

    A = matrix(
    c(1:6),              # the data elements to be filled in 
    nrow=2,              # the number of rows,  
    ncol=3,              # the number of columns, and
    byrow = TRUE)        # filling the matrix rowwise (one row at a time). 
    A
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Select elements of a matrix

An element of a n x m matrix can be selected with its index M[n, m], similarly to vectors. Consider the matrix:

  M <- matrix(round(runif(12, 5, 10), 0), # generate random integers 
                    nrow = 3,             # and fill into a matrix of 3 rows
                    ncol = 4)             # and 4 columns. 
  M
##      [,1] [,2] [,3] [,4]
## [1,]    9    9    6   10
## [2,]    6    7    9    6
## [3,]    7    9    6    8

To get the value of the first column:

  M[, 1]
## [1] 9 6 7

To get the value of the second row:

  M[2, ]
## [1] 6 7 9 6

To get the values of the first and third column:

  M[, c(1, 3)]
##      [,1] [,2]
## [1,]    9    6
## [2,]    6    9
## [3,]    7    6

To get the value of a specific element:

  M[2, 3]
## [1] 9

To get the elements that satisfy a condition:

  M[M > 7]    # this returns a vector
## [1]  9  9  9  9 10  8

To get rid of a row or column, you can use a negative index:

  M[, -2]
##      [,1] [,2] [,3]
## [1,]    9    6   10
## [2,]    6    9    6
## [3,]    7    6    8

Function dim()

Given a matrix:

  M <- matrix(rep(c(1:4), times = 3), 3, 4)
  M
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    3    2
## [2,]    2    1    4    3
## [3,]    3    2    1    4

The dimension of a matrix is given by the function dim():

  dim(M)    # return the number of rows and columns
## [1] 3 4

The function dim() can also be used to change dimensions:

  dim(M) <- c(4, 3)
  M
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    2    2    2
## [3,]    3    3    3
## [4,]    4    4    4

The same synthax is used to transform a matrix to a vector:

  dim(M) <- c(12, 1)
  M
##       [,1]
##  [1,]    1
##  [2,]    2
##  [3,]    3
##  [4,]    4
##  [5,]    1
##  [6,]    2
##  [7,]    3
##  [8,]    4
##  [9,]    1
## [10,]    2
## [11,]    3
## [12,]    4

Function dimnames():

Sometime, it is more simple to refer to names instead of numerical indices. For this you can define the names of rows and columns by the fnuction dimnames():

  M <- matrix(rep(c(1:3), times = 4), 3, 4)
  dimnames(M) = list(
  c("row1", "row2", "row3"),         # row names 
  c("col1", "col2", "col3", "col4")) # column names 
  M
##      col1 col2 col3 col4
## row1    1    1    1    1
## row2    2    2    2    2
## row3    3    3    3    3

Lists

A list is the most flexible container of objects in R. Its elements can be unrelated, of any type and size.

    mylist <- list(name=c("A", "B", "C"),
                  numb=c(1,2,3,4),
                  matr=cbind(c(2,1),c(1,2)),
                  vect=c(5,3,4,5,6,2))
    mylist
## $name
## [1] "A" "B" "C"
## 
## $numb
## [1] 1 2 3 4
## 
## $matr
##      [,1] [,2]
## [1,]    2    1
## [2,]    1    2
## 
## $vect
## [1] 5 3 4 5 6 2

An object contained in the list can be accessed with the dollar notation:

    mylist$name
## [1] "A" "B" "C"

Dataframes

Dataframes are between a list and a matrix. It is like a list since the columns can contain different types of objects (i.e. texts, numbers, factors). It is like a matrix since the output is a table. To create a dataframe from scratch:

  origin <- c("ITA", "AUT", "FRA")
  protein <- c(2, 3, 2)
  sugar <- c(8, 12, 10)
  mydata <- data.frame(origin,protein,sugar)
  mydata
##   origin protein sugar
## 1    ITA       2     8
## 2    AUT       3    12
## 3    FRA       2    10

To edit manually the data, use the command "edit(mydata)"

Factors

With the dataframe "mydata", the variable "origin" should be used as categorical factor:

    mydata[,1] <- factor(mydata[,1])

This statement stores this vector as (1, 2, 1, 1) and associates it with 1=Type1 and 2=Type2 internally.

Built in functions to check data structures

    str(mydata)    # structure of an object
## 'data.frame':	3 obs. of  3 variables:
##  $ origin : Factor w/ 3 levels "AUT","FRA","ITA": 3 1 2
##  $ protein: num  2 3 2
##  $ sugar  : num  8 12 10
    class(mydata)  # class or type of an object
## [1] "data.frame"
    names(mydata)  # names
## [1] "origin"  "protein" "sugar"

1 2 3 4 5