In this lecture, we will talk about loops. Loops are very powerful in terms to do some tasks repeatedly (like thousands of times). Computers are very good with doing repeated tasks, which we as human are pretty bad. Even though we present examples in R here, the logic applies to pretty much any programming language.

Loops repeat two steps: evaluation and execution. For evaluation, we tell the computer to test for a specific condition, and there are two outcomes of this evaluation: condition satisfied and not satisfied. Then for each scenario, we can let the computer know what to do respectively.

Three types of loops in R:

repeat

repeat is the simplest loop to repeat the same expression

repeat {expression}

To stop repeating the expression, you can use the keyword break. To skip to the next iteration in a loop, you can use the command next.

i <- 5
repeat {
  if (i > 25) 
    break 
  else 
    print(i); i <- i + 5
}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25

If you do not include a break command, the R code will be an infinite loop.

while

Another useful construction is while loops, which repeat an expression while a condition is true:

while (condition) expression
i <- 5
while (i <= 25) { 
  print(i) 
  i <- i + 5
}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25

You can also use break and next inside while loops. The break statement is used to stop iterating through a loop. The next statement skips to the next loop iteration without evaluating the remaining expressions in the loop body.

for

# syntax
for (variable in sequence){
  statements for each variable
}

Results are not printed inside a loop unless you explicitly call the print function.

for (i in 1:5) {
  paste(i, "^2 = ", i ^ 2, sep = "")
}

Add print.

for (i in 1:5) {
  print(paste(i, "^2 = ", i ^ 2, sep = ""))
}
## [1] "1^2 = 1"
## [1] "2^2 = 4"
## [1] "3^2 = 9"
## [1] "4^2 = 16"
## [1] "5^2 = 25"
for (i in 1:5) {
  # Concatenate and Print
  cat(i, "^2 = ", i ^ 2, "\n", sep = "")
}
## 1^2 = 1
## 2^2 = 4
## 3^2 = 9
## 4^2 = 16
## 5^2 = 25
a <- c(2, 6)
for (i in 1:2) {
  print(paste('i', '=', i, sep=' ')) 
  print(paste(c('a', '=', a), collapse=' ')) 
  cat('execution\n') 
  a[i] <- a[i] + 10 
  print(paste(c('a', '=', a), collapse=' ')) 
  print(paste('i', '=', i, sep=' ')) 
  cat('next\n\n') 
}
## [1] "i = 1"
## [1] "a = 2 6"
## execution
## [1] "a = 12 6"
## [1] "i = 1"
## next
## 
## [1] "i = 2"
## [1] "a = 12 6"
## execution
## [1] "a = 12 16"
## [1] "i = 2"
## next
a
## [1] 12 16
a <- c(2, 6) 
a + 10
## [1] 12 16

The variable var that is set in a for loop is changed in the calling environment.

b <- character(2) # to hold results
i <- 1
for (i in 1:2) {
  b[i] <- toupper(a[i])
}
i
## [1] 2
b
## [1] "2" "6"
toupper(a)
## [1] "2" "6"
years <- 2016:2025
for (i in seq_along(years)) {
  print(paste('Year', years[i], sep=' '))
}
## [1] "Year 2016"
## [1] "Year 2017"
## [1] "Year 2018"
## [1] "Year 2019"
## [1] "Year 2020"
## [1] "Year 2021"
## [1] "Year 2022"
## [1] "Year 2023"
## [1] "Year 2024"
## [1] "Year 2025"
paste('Year', 2016:2025, sep=' ')
##  [1] "Year 2016" "Year 2017" "Year 2018" "Year 2019" "Year 2020" "Year 2021"
##  [7] "Year 2022" "Year 2023" "Year 2024" "Year 2025"

Applying a Function to Each Element of an Object

Except loops such as for(), The base R packages include a set of different functions to apply a function to each element (or a subset of elements) of an object.

To apply a function to part of an array or a matrix

To apply a function to parts of an array (or matrix), use the apply function:

apply(array, margin, FUN, ...)
# ... are arguments to be passed to FUN
x = matrix(1:20, nrow = 5)
x
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
# for each row, find the max value
for(i in 1:nrow(x)){
  print(max(x[i, ]))
}
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
# apply
apply(X = x, MARGIN = 1, FUN = max, na.rm = TRUE)
## [1] 16 17 18 19 20
# for each column, find the max value
for(i in 1:ncol(x)){
  print(max(x[, i]))
}
## [1] 5
## [1] 10
## [1] 15
## [1] 20
# apply
apply(X = x, MARGIN = 2, FUN = max, na.rm = TRUE)
## [1]  5 10 15 20

A more complicated example:

x = array(1:27, dim = c(3, 3, 3))
x
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   10   13   16
## [2,]   11   14   17
## [3,]   12   15   18
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]   19   22   25
## [2,]   20   23   26
## [3,]   21   24   27
apply(X = x, MARGIN = 1, FUN = paste, collapse = ", ")
## [1] "1, 4, 7, 10, 13, 16, 19, 22, 25" "2, 5, 8, 11, 14, 17, 20, 23, 26"
## [3] "3, 6, 9, 12, 15, 18, 21, 24, 27"
apply(X = x, MARGIN = 2, FUN = paste, collapse = ", ")
## [1] "1, 2, 3, 10, 11, 12, 19, 20, 21" "4, 5, 6, 13, 14, 15, 22, 23, 24"
## [3] "7, 8, 9, 16, 17, 18, 25, 26, 27"
apply(X = x, MARGIN = 3, FUN = paste, collapse = ", ")
## [1] "1, 2, 3, 4, 5, 6, 7, 8, 9"          "10, 11, 12, 13, 14, 15, 16, 17, 18"
## [3] "19, 20, 21, 22, 23, 24, 25, 26, 27"
apply(X = x, MARGIN = c(1, 2), FUN = paste, collapse = ", ")
##      [,1]        [,2]        [,3]       
## [1,] "1, 10, 19" "4, 13, 22" "7, 16, 25"
## [2,] "2, 11, 20" "5, 14, 23" "8, 17, 26"
## [3,] "3, 12, 21" "6, 15, 24" "9, 18, 27"

Apply a function to a list or vector

To apply a function to each element in a vector or a list and return a list, you can use the function lapply. The function lapply requires two arguments: an object X and a function FUN. (You may specify additional arguments that will be passed to FUN.)

x <- 1:5
lapply(X = x, FUN = function(i) 2^i)
## [[1]]
## [1] 2
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 8
## 
## [[4]]
## [1] 16
## 
## [[5]]
## [1] 32
x <- data.frame(x = 1:5, y = 6:10)
lapply(X = x, FUN = function(i) 2^i)
## $x
## [1]  2  4  8 16 32
## 
## $y
## [1]   64  128  256  512 1024
lapply(X = x, FUN = mean)
## $x
## [1] 3
## 
## $y
## [1] 8

Sometimes, you might prefer to get a vector, matrix, or array instead of a list. To do this, use the sapply function. This function works exactly the same way as lapply, except that it returns a vector or matrix (when appropriate).

sapply(X = x, FUN = function(i) 2^i)
##       x    y
## [1,]  2   64
## [2,]  4  128
## [3,]  8  256
## [4,] 16  512
## [5,] 32 1024
vapply(X = x, FUN = function(i) 2^i, FUN.VALUE = rep(1, 5))
##       x    y
## [1,]  2   64
## [2,]  4  128
## [3,]  8  256
## [4,] 16  512
## [5,] 32 1024

Another related function is mapply, the “multivariate” version of sapply.

mapply(paste, 
       c(1, 2, 3, 4, 5),
       c("a","b","c","d","e"),
       c("A","B","C","D","E"),
       MoreArgs = list(sep = "-")
)
## [1] "1-a-A" "2-b-B" "3-c-C" "4-d-D" "5-e-E"

Tidyverse

The functions mentioned above are all from base R libraries. The popular tidyverse packages also have functions that are similar to the apply set functions.

Data frame: work with data frame row by row.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df <- data.frame(x = runif(6), y = runif(6), z = runif(6))
df
##           x         y         z
## 1 0.2543282 0.7899347 0.7909086
## 2 0.5812300 0.4571883 0.3104377
## 3 0.9034680 0.6047601 0.5428415
## 4 0.3123348 0.2626777 0.5094999
## 5 0.6594451 0.5237566 0.6582903
## 6 0.9122501 0.3042793 0.1354054
# Compute the mean of x, y, z in each row
df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 6 × 4
## # Rowwise: 
##       x     y     z     m
##   <dbl> <dbl> <dbl> <dbl>
## 1 0.254 0.790 0.791 0.612
## 2 0.581 0.457 0.310 0.450
## 3 0.903 0.605 0.543 0.684
## 4 0.312 0.263 0.509 0.362
## 5 0.659 0.524 0.658 0.614
## 6 0.912 0.304 0.135 0.451

Work with some columns that meet specific conditions.

df$a = letters[1:6]
df$b = LETTERS[1:6]
str(df)
## 'data.frame':    6 obs. of  5 variables:
##  $ x: num  0.254 0.581 0.903 0.312 0.659 ...
##  $ y: num  0.79 0.457 0.605 0.263 0.524 ...
##  $ z: num  0.791 0.31 0.543 0.509 0.658 ...
##  $ a: chr  "a" "b" "c" "d" ...
##  $ b: chr  "A" "B" "C" "D" ...
# convert columns with characters to factors
for(i in 1:ncol(df)){
  if(is.character(df[,i])){
    df[,i] = as.factor(df[,i])
  }
}
str(df)
## 'data.frame':    6 obs. of  5 variables:
##  $ x: num  0.254 0.581 0.903 0.312 0.659 ...
##  $ y: num  0.79 0.457 0.605 0.263 0.524 ...
##  $ z: num  0.791 0.31 0.543 0.509 0.658 ...
##  $ a: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6
##  $ b: Factor w/ 6 levels "A","B","C","D",..: 1 2 3 4 5 6
# dplyr version to convert factors to characters back
df = mutate_if(.tbl = df, .predicate = is.factor, .funs = as.character)
str(df)
## 'data.frame':    6 obs. of  5 variables:
##  $ x: num  0.254 0.581 0.903 0.312 0.659 ...
##  $ y: num  0.79 0.457 0.605 0.263 0.524 ...
##  $ z: num  0.791 0.31 0.543 0.509 0.658 ...
##  $ a: chr  "a" "b" "c" "d" ...
##  $ b: chr  "A" "B" "C" "D" ...

Work with vectors, lists, or multiple of them.

purrr::map_dbl(.x = 1:5, .f = function(i) i^2)
## [1]  1  4  9 16 25
purrr::map_dbl(.x = 1:5, .f = ~ .x^2)
## [1]  1  4  9 16 25
purrr::map_chr(.x = c("a", "b", "c"), .f = function(i) toupper(i))
## [1] "A" "B" "C"
purrr::map(.x = rnorm(5), function(i) i > 0)
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] FALSE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] FALSE

Two variables.

purrr::map2_chr(.x = c("a", "b", "c"), .y = c("A", "B", "C"), .f = paste, sep = "-")
## [1] "a-A" "b-B" "c-C"
purrr::map2_dbl(c(1:3), c(-1:-3), .f = `+`)
## [1] 0 0 0

More than two variables.

x <- list(1, 1, 1)
y <- list(10, 20, 30)
z <- list(100, 200, 300)
purrr::pmap(list(x, y, z), sum)
## [[1]]
## [1] 111
## 
## [[2]]
## [1] 221
## 
## [[3]]
## [1] 331
# Matching arguments by position
purrr::pmap(list(x, y, z), function(first, second, third) (first + third) * second)
## [[1]]
## [1] 1010
## 
## [[2]]
## [1] 4020
## 
## [[3]]
## [1] 9030