Control flow

Sometimes, we only want to execute some statements if certain conditions are met; sometimes, we want to execute some statement repetitively for each elements of a dataset or until certain conditions/values reached. This is where control-flow constructs come in. In this week’s lectures, we will largely focus on this topic.

For the syntax examples throughout this lecture, keep the following in mind:

Conditional execution

In conditional execution, a statement or statements are only executed if a specified condition is met. These constructs include if-else, ifelse, and switch.

Before we talk about these functions, it is necessary to talk about R’s logical operators.

Operator Description
< less than
<= less than or equal to
> larger than
>= larger than or equal to
== exactly equal to
! opposite of
!= not equal to
| or
& and
%in% in
isTRUE() is it true?

Surprise surprise!

i = 0.1
i = i + 0.05
i == 0.15 # ? what do you say?

if ... else ...

How does an if else statement work in R?

# General form
if(cond){
  statement for TRUE
}

# else is optional
if(cond){
  statement for TRUE
} else {
  statement for FALSE
} 
# bracket can be ignored too; 
# but I find it is easier to read with it

if(cond){
  statement for TRUE
} else {
  another if ... else ...
} 
a <- 4
if (a %% 2 == 0) {
  b1 <- a * -1
}
a
## [1] 4
b1
## [1] -4
a <- 5
if (a %% 2 == 0) {
  b <- a * -1
} else {
  b <- a ^ 2
}
b
## [1] 25
# nested if else
a <- 0.33
if (a %% 2 == 0) {
  b <- a * -1
} else {
  if (a %% 2 == 1) {
    b <- a ^ 2
  } else {
    b <- a * 10
  }
}
b
## [1] 3.3

Why use if else statement?

# Define your own categories
small <- 1:5
medium <- 6:10
large <- 11:15

a <- 2
if (a %in% small) {
  b <- 'Small number'
} else if (a %in% medium) {
  b <- 'Medium Number'
} else if (a %in% large) {
  b <- 'Large number'
} else {
  b <- 'Number out of range'
}
b
## [1] "Small number"
a <- 12
if (a %in% small) {
  b <- 'Small number'
} else {
  if (a %in% medium) {
    b <- 'Medium Number'
  } else {
    if (a %in% large) {
      b <- 'Large number'
    } else {
      b <- 'Number out of range'
    }
  }
} 
b
## [1] "Large number"

Defensive coding

When write your own function or doing analysis, we should be very careful about potential errors / failures. It is recommended to warn / message / fail early!

if(TRUE) warning("This is a warning")
## Warning: This is a warning
if(TRUE) message("This is a message")
## This is a message
# if(TRUE) stop("Stoped because of ...")
stopifnot()

So if else is useful to work with one value/situation/condition/evaluation at each time. What if we have multiple values/cases need to evaluate?

  • ifelse()
  • combine with loops such as for() (next lecture)

ifelse()

ifelse() is a compact and vectorized version of the if-else construct. The syntax is:

ifelse(cond, statement for TRUE, statement for FALSE)

Problem to solve

Suppose that we have a vector vec_a with 5 elements:

vec_a <- c(3, 2, -5, 7, 0)

Now we want to take square root for positive and zero values while leave negative values alone. How can we do this?

We may do the following things step by step for each element:

  1. Is it non-negative?
  2. If yes, take the square root (recall the functions from last class).
  3. If no, return the original value.

Solution 1: combine if else with for()

We will talk more about loops in the next lecture.

vec_a2 <- vector(mode = "numeric", length = length(vec_a)) # to hold results
for (i in 1:length(vec_a)){
  if (vec_a[i] >= 0){
    vec_a2[i] = sqrt(vec_a[i])
  } else {
    vec_a2[i] = vec_a[i]
  }
} 
vec_a2
## [1]  1.732051  1.414214 -5.000000  2.645751  0.000000
Pros Cons
Intuitive Multiple steps of coding: reate vector to hold results, then the 3-steps procedure for each element
Easy to follow Relatively slow (imagine millions of elements)

Solution 2: use ifelse()

ifelse(vec_a >= 0, sqrt(vec_a), vec_a)
## Warning in sqrt(vec_a): NaNs produced
## [1]  1.732051  1.414214 -5.000000  2.645751  0.000000
vec_a2
## [1]  1.732051  1.414214 -5.000000  2.645751  0.000000

Vectorization

A “whole project” thinking instead of “element-wise”.

  • In R, this means a function takes the whole vector as input (instead of one element each time the if() function takes within the for() loop above)
  • In R, vectorized functions are much faster
  • But for loops are still very important (see next class)
  • More information about vectorization

Why warning message??

Warning message:
In sqrt(vec_a) : NaNs produced
(vec_test <- vec_a >= 0)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE
(vec_yes <- sqrt(vec_a)) # warning here
## Warning in sqrt(vec_a): NaNs produced
## [1] 1.732051 1.414214      NaN 2.645751 0.000000
(vec_no <- vec_a)
## [1]  3  2 -5  7  0
ifelse(vec_test, vec_yes, vec_no) # no warning
## [1]  1.732051  1.414214 -5.000000  2.645751  0.000000
  • if the ith element of vec_test is TRUE
  • then take the ith element of vec_yes
  • otherwise, take the ith element of vec_no

Now, suppose instead of return the original value if it is negative, we want to return NA.

ifelse(vec_a >= 0, sqrt(vec_a), NA)
## Warning in sqrt(vec_a): NaNs produced
## [1] 1.732051 1.414214       NA 2.645751 0.000000
sqrt(ifelse(vec_a >= 0, vec_a, NA)) # No warning any more
## [1] 1.732051 1.414214       NA 2.645751 0.000000

ifelse() works for whole matrix too

mat_a <- matrix(data = c(1:6, -2, -5, 0), nrow = 3, ncol = 3)
mat_a
##      [,1] [,2] [,3]
## [1,]    1    4   -2
## [2,]    2    5   -5
## [3,]    3    6    0
ifelse(mat_a > 0, sqrt(mat_a), mat_a)
## Warning in sqrt(mat_a): NaNs produced
##          [,1]     [,2] [,3]
## [1,] 1.000000 2.000000   -2
## [2,] 1.414214 2.236068   -5
## [3,] 1.732051 2.449490    0

More examples

Suppose we have exam scores for students, and we want to assign pass (>= 60) or fail (<60) to each of them.

set.seed(123)
scores <- data.frame(names = letters,
                     score = runif(n = 26, min = 30, max = 100))
head(scores, n = 3)
##   names    score
## 1     a 50.13043
## 2     b 85.18136
## 3     c 58.62838
hist(scores$score)

How do we add another column that will indicate pass and fail?

str(scores)
## 'data.frame':    26 obs. of  2 variables:
##  $ names: chr  "a" "b" "c" "d" ...
##  $ score: num  50.1 85.2 58.6 91.8 95.8 ...
scores$score
##  [1] 50.13043 85.18136 58.62838 91.81122 95.83271 33.18895 66.96738 92.46933
##  [9] 68.60045 61.96303 96.97833 61.73339 77.42994 70.08434 37.20473 92.98775
## [17] 47.22614 32.94417 52.95445 96.81526 92.26775 78.49624 74.83548 99.59888
## [25] 75.89941 79.59713
(pass_fail <- ifelse(scores$score >= 60, "pass", "fail")) # recycle
##  [1] "fail" "pass" "fail" "pass" "pass" "fail" "pass" "pass" "pass" "pass"
## [11] "pass" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "fail" "pass"
## [21] "pass" "pass" "pass" "pass" "pass" "pass"
scores$status <- pass_fail
head(scores, n = 6)
##   names    score status
## 1     a 50.13043   fail
## 2     b 85.18136   pass
## 3     c 58.62838   fail
## 4     d 91.81122   pass
## 5     e 95.83271   pass
## 6     f 33.18895   fail
table(scores$status)
## 
## fail pass 
##    7   19
table(scores$score >= 60)
## 
## FALSE  TRUE 
##     7    19

What about if we want also assign “A” (>= 85), “B” (>= 75), “C” (>= 60) to those who passed the exam?

# nested ifelse
vec_levels <- 
  ifelse(scores$score < 60, "fail", 
       ifelse(scores$score < 75, "C",
              ifelse(scores$score < 85, "B", "A")))
vec_score <- scores$score # to make it easier to check
names(vec_score) <- vec_levels # in pratice, you should make it as a new column
vec_score
##     fail        A     fail        A        A     fail        C        A 
## 50.13043 85.18136 58.62838 91.81122 95.83271 33.18895 66.96738 92.46933 
##        C        C        A        C        B        C     fail        A 
## 68.60045 61.96303 96.97833 61.73339 77.42994 70.08434 37.20473 92.98775 
##     fail     fail     fail        A        A        B        C        A 
## 47.22614 32.94417 52.95445 96.81526 92.26775 78.49624 74.83548 99.59888 
##        B        B 
## 75.89941 79.59713
dplyr::case_when(
  scores$score < 60 ~ "fail",
  scores$score < 75 ~ "C",
  scores$score < 85 ~ "B",
  TRUE ~ "A"
)
##  [1] "fail" "A"    "fail" "A"    "A"    "fail" "C"    "A"    "C"    "C"   
## [11] "A"    "C"    "B"    "C"    "fail" "A"    "fail" "fail" "fail" "A"   
## [21] "A"    "B"    "C"    "A"    "B"    "B"

If needed, you can also assign “D” (e.g. >= 50, < 60), “E” (< 50) to those who failed by replacing "fail" with another ifelse().

# "A" (>= 85), "B" (>= 75), "C" (>= 60)
boxplot(vec_score ~ vec_levels)
abline(h = c(60, 75, 85), lty = 2)

Be careful when use ifelse() with Dates and factors

(vec_x <- factor(c("b", "a", "d", "e")))
## [1] b a d e
## Levels: a b d e
(vec_y <- ifelse(vec_x == "a", vec_x, NA))
## [1] NA  1 NA NA
as.numeric(vec_x)
## [1] 2 1 3 4
vec_x2 <- as.character(vec_x) # convert first
ifelse(vec_x2 == "a", vec_x2, NA)
## [1] NA  "a" NA  NA

See ?ifelse for example of Dates

  • The results of ifelse() have the same length as the input test, but attributes may not been preserved
  • Values are selected from vec_yes and vec_no
  • If vec_yes and vec_no are too short, they are recycled
  • NA in the input give NA in the output

if else vs ifelse()

if(test) yes else no is much more efficient and often much preferable to ifelse(test, yes, no) whenever test is a simple true/false result, i.e., when length(test) == 1.

switch()

switch chooses statements based on the value of an expression. The syntax is switch(expr, ...) where ... represents statements tied to the possible outcome values of expr. It’s easiest to understand how switch works by looking at the example in the following listing.

feelings <- c("sad", "afraid")
for (i in feelings){
  print(
    switch(i,
           happy  = "I am glad you are happy",
           afraid = "There is nothing to fear",
           sad    = "Cheer up",
           angry  = "Calm down now"
    ) 
  )
}
## [1] "Cheer up"
## [1] "There is nothing to fear"

switch can be very useful when write our own functions to deal with different scenarios.

mydate <- function(type = "long") {
  switch(type,
         long =  format(Sys.time(), "%A %B %d %Y"),
         short = format(Sys.time(), "%m-%d-%y"),
         cat(type, "is not a recognized type\n")
  ) 
}
mydate("long")
## [1] "Monday September 18 2023"
mydate("short")
## [1] "09-18-23"
mydate("medium")
## medium is not a recognized type