Sometimes, we only want to execute some statements if certain conditions are met; sometimes, we want to execute some statement repetitively for each elements of a dataset or until certain conditions/values reached. This is where control-flow constructs come in. In this week’s lectures, we will largely focus on this topic.
For the syntax examples throughout this lecture, keep the following in mind:
{ }
and
separated by semicolons).cond
is an expression that resolves to
TRUE
or FALSE
.expr
is a statement that evaluates to a number or
character string.seq
is a sequence of numbers or character strings.In conditional execution, a statement or statements are only executed
if a specified condition is met. These constructs include
if-else
, ifelse
, and switch
.
Before we talk about these functions, it is necessary to talk about R’s logical operators.
Operator | Description |
---|---|
< |
less than |
<= |
less than or equal to |
> |
larger than |
>= |
larger than or equal to |
== |
exactly equal to |
! |
opposite of |
!= |
not equal to |
| |
or |
& |
and |
%in% |
in |
isTRUE() |
is it true? |
if ... else ...
How does an if else statement work in R?
# General form
if(cond){
statement for TRUE
}
# else is optional
if(cond){
statement for TRUE
} else {
statement for FALSE
}
# bracket can be ignored too;
# but I find it is easier to read with it
if(cond){
statement for TRUE
} else {
another if ... else ...
}
## [1] 4
## [1] -4
## [1] 25
# nested if else
a <- 0.33
if (a %% 2 == 0) {
b <- a * -1
} else {
if (a %% 2 == 1) {
b <- a ^ 2
} else {
b <- a * 10
}
}
b
## [1] 3.3
Why use if else statement?
# Define your own categories
small <- 1:5
medium <- 6:10
large <- 11:15
a <- 2
if (a %in% small) {
b <- 'Small number'
} else if (a %in% medium) {
b <- 'Medium Number'
} else if (a %in% large) {
b <- 'Large number'
} else {
b <- 'Number out of range'
}
b
## [1] "Small number"
a <- 12
if (a %in% small) {
b <- 'Small number'
} else {
if (a %in% medium) {
b <- 'Medium Number'
} else {
if (a %in% large) {
b <- 'Large number'
} else {
b <- 'Number out of range'
}
}
}
b
## [1] "Large number"
When write your own function or doing analysis, we should be very careful about potential errors / failures. It is recommended to warn / message / fail early!
## Warning: This is a warning
## This is a message
So if else is useful to work with one value/situation/condition/evaluation at each time. What if we have multiple values/cases need to evaluate?
ifelse()
for()
(next lecture)ifelse()
ifelse()
is a compact and vectorized version of the
if-else construct. The syntax is:
Suppose that we have a vector vec_a
with 5 elements:
Now we want to take square root for positive and zero values while leave negative values alone. How can we do this?
We may do the following things step by step for each element:
We will talk more about loops in the next lecture.
vec_a2 <- vector(mode = "numeric", length = length(vec_a)) # to hold results
for (i in 1:length(vec_a)){
if (vec_a[i] >= 0){
vec_a2[i] = sqrt(vec_a[i])
} else {
vec_a2[i] = vec_a[i]
}
}
vec_a2
## [1] 1.732051 1.414214 -5.000000 2.645751 0.000000
Pros | Cons |
---|---|
Intuitive | Multiple steps of coding: reate vector to hold results, then the 3-steps procedure for each element |
Easy to follow | Relatively slow (imagine millions of elements) |
ifelse()
## Warning in sqrt(vec_a): NaNs produced
## [1] 1.732051 1.414214 -5.000000 2.645751 0.000000
## [1] 1.732051 1.414214 -5.000000 2.645751 0.000000
A “whole project” thinking instead of “element-wise”.
R
, this means a function takes the whole vector as
input (instead of one element each time the if()
function
takes within the for()
loop above)R
, vectorized functions are much fasterfor
loops are still very important (see next
class)Why warning message??
## [1] TRUE TRUE FALSE TRUE TRUE
## Warning in sqrt(vec_a): NaNs produced
## [1] 1.732051 1.414214 NaN 2.645751 0.000000
## [1] 3 2 -5 7 0
## [1] 1.732051 1.414214 -5.000000 2.645751 0.000000
vec_test
is
TRUE
vec_yes
vec_no
Now, suppose instead of return the original value if it is negative,
we want to return NA
.
## Warning in sqrt(vec_a): NaNs produced
## [1] 1.732051 1.414214 NA 2.645751 0.000000
## [1] 1.732051 1.414214 NA 2.645751 0.000000
ifelse()
works for whole matrix too## [,1] [,2] [,3]
## [1,] 1 4 -2
## [2,] 2 5 -5
## [3,] 3 6 0
## Warning in sqrt(mat_a): NaNs produced
## [,1] [,2] [,3]
## [1,] 1.000000 2.000000 -2
## [2,] 1.414214 2.236068 -5
## [3,] 1.732051 2.449490 0
Suppose we have exam scores for students, and we want to assign pass (>= 60) or fail (<60) to each of them.
set.seed(123)
scores <- data.frame(names = letters,
score = runif(n = 26, min = 30, max = 100))
head(scores, n = 3)
## names score
## 1 a 50.13043
## 2 b 85.18136
## 3 c 58.62838
How do we add another column that will indicate pass and fail?
## 'data.frame': 26 obs. of 2 variables:
## $ names: chr "a" "b" "c" "d" ...
## $ score: num 50.1 85.2 58.6 91.8 95.8 ...
## [1] 50.13043 85.18136 58.62838 91.81122 95.83271 33.18895 66.96738 92.46933
## [9] 68.60045 61.96303 96.97833 61.73339 77.42994 70.08434 37.20473 92.98775
## [17] 47.22614 32.94417 52.95445 96.81526 92.26775 78.49624 74.83548 99.59888
## [25] 75.89941 79.59713
## [1] "fail" "pass" "fail" "pass" "pass" "fail" "pass" "pass" "pass" "pass"
## [11] "pass" "pass" "pass" "pass" "fail" "pass" "fail" "fail" "fail" "pass"
## [21] "pass" "pass" "pass" "pass" "pass" "pass"
## names score status
## 1 a 50.13043 fail
## 2 b 85.18136 pass
## 3 c 58.62838 fail
## 4 d 91.81122 pass
## 5 e 95.83271 pass
## 6 f 33.18895 fail
##
## fail pass
## 7 19
##
## FALSE TRUE
## 7 19
What about if we want also assign “A” (>= 85), “B” (>= 75), “C” (>= 60) to those who passed the exam?
# nested ifelse
vec_levels <-
ifelse(scores$score < 60, "fail",
ifelse(scores$score < 75, "C",
ifelse(scores$score < 85, "B", "A")))
vec_score <- scores$score # to make it easier to check
names(vec_score) <- vec_levels # in pratice, you should make it as a new column
vec_score
## fail A fail A A fail C A
## 50.13043 85.18136 58.62838 91.81122 95.83271 33.18895 66.96738 92.46933
## C C A C B C fail A
## 68.60045 61.96303 96.97833 61.73339 77.42994 70.08434 37.20473 92.98775
## fail fail fail A A B C A
## 47.22614 32.94417 52.95445 96.81526 92.26775 78.49624 74.83548 99.59888
## B B
## 75.89941 79.59713
dplyr::case_when(
scores$score < 60 ~ "fail",
scores$score < 75 ~ "C",
scores$score < 85 ~ "B",
TRUE ~ "A"
)
## [1] "fail" "A" "fail" "A" "A" "fail" "C" "A" "C" "C"
## [11] "A" "C" "B" "C" "fail" "A" "fail" "fail" "fail" "A"
## [21] "A" "B" "C" "A" "B" "B"
If needed, you can also assign “D” (e.g. >= 50, < 60), “E”
(< 50) to those who failed by replacing "fail"
with
another ifelse()
.
# "A" (>= 85), "B" (>= 75), "C" (>= 60)
boxplot(vec_score ~ vec_levels)
abline(h = c(60, 75, 85), lty = 2)
Be careful when use ifelse()
with Dates and
factors
## [1] b a d e
## Levels: a b d e
## [1] NA 1 NA NA
## [1] 2 1 3 4
## [1] NA "a" NA NA
See ?ifelse
for example of Dates
ifelse()
have the same length as the
input test
, but attributes may not been preservedvec_yes
and
vec_no
vec_yes
and vec_no
are too short, they
are recycledNA
in the input give NA
in the outputif else
vs ifelse()
if(test) yes else no
is much more efficient and often
much preferable to ifelse(test, yes, no)
whenever
test
is a simple true/false result, i.e., when
length(test) == 1
.
switch()
switch
chooses statements based on the value of an
expression. The syntax is switch(expr, ...)
where
...
represents statements tied to the possible outcome
values of expr
. It’s easiest to understand how switch works
by looking at the example in the following listing.
feelings <- c("sad", "afraid")
for (i in feelings){
print(
switch(i,
happy = "I am glad you are happy",
afraid = "There is nothing to fear",
sad = "Cheer up",
angry = "Calm down now"
)
)
}
## [1] "Cheer up"
## [1] "There is nothing to fear"
switch
can be very useful when write our own functions
to deal with different scenarios.
mydate <- function(type = "long") {
switch(type,
long = format(Sys.time(), "%A %B %d %Y"),
short = format(Sys.time(), "%m-%d-%y"),
cat(type, "is not a recognized type\n")
)
}
mydate("long")
## [1] "Monday September 18 2023"
## [1] "09-18-23"
## medium is not a recognized type