One of the most powerful feature of R is its large packages (~20k). Each package contains some useful functions that allow users to use them to convert some input into an output. Today, we will learn how to write basic R functions.
The basic structure of a function is:
my_function_name <- function(data, arg1 = value1, arg2 = value2, arg3, arg4, ...){
<command to do things with the data, with args to control behaviors>
return(results)
}
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C) # return is optional
}
fahrenheit_to_celsius(68)
## [1] 20
Exercise: write a function to convert celsius to fahrenheit.
celsius_to_kelvin <- function(temp_C) {
temp_K <- temp_C + 273.15
return(temp_K)
}
# freezing point of water in Kelvin
celsius_to_kelvin(0)
## [1] 273.15
Exercise: write a function to convert fahrenheit to kelvin. Hint: use
fahrenheit_to_celsius()
and
celsius_to_kelvin()
.
fahrenheit_to_kelvin <- function(temp_F) {
temp_C <- ____________
temp_K <- ____________
temp_K
}
Tips: Write modularized functions!
The environment of a function controls how R finds the value associated with a name.
f <- function(x) {
x + y
}
R uses rules called lexical scoping to find the value associated with a name. Since y is not defined inside the function, R will look in the environment where the function was defined:
y <- 100
f(10)
## [1] 110
y <- 1000
f(10)
## [1] 1010
Recall from home work 5 that we wrote some code to extract the
binomial names from species names. The neonDivData
has
multiple data frames for different taxonomic groups. How can we make it
as a function so that we can apply it to all other data frames?
Tips: Write working code first, then convert into a function
library(neonDivData)
x = sub(pattern = "^([^ ]+ [^ ]*) .*", replacement = "\\1", x = data_plant$taxon_name)
sample(unique(x), 100)
## [1] "Erigeron vernus" "Viola spp."
## [3] "Eragrostis scaligera" "Eriogonum abertianum"
## [5] "Mirabilis nyctaginea" "Symphyotrichum novae-angliae"
## [7] "Antennaria rosea" "Opuntia dillenii"
## [9] "Quercus lyrata" "Diodia virginiana"
## [11] "Quercus pungens" "Anthemis cotula"
## [13] "Ribes spp." "Iris verna"
## [15] "Carex laxiflora" "Cosmos caudatus"
## [17] "Cirsium canescens" "Saxifraga sp."
## [19] "Prosopis pubescens" "Emilia fosbergii"
## [21] "Fuirena breviseta" "Desmodium viridiflorum"
## [23] "Potentilla rupincola" "Forestiera acuminata"
## [25] "Clarkia sp." "Viola sororia"
## [27] "Phlox stolonifera" "Persea americana"
## [29] "Coreopsis sp." "Setaria arizonica"
## [31] "Bidens frondosa" "Senecio spartioides"
## [33] "Paspalum notatum" "Leersia sp."
## [35] "Sphaeralcea hastulata" "Physaria spp."
## [37] "Cyperus esculentus" "Kalmia sp."
## [39] "Cathestecum erectum" "Quercus minima"
## [41] "Asclepias verticillata" "Andromeda polifolia"
## [43] "Cissus verticillata" "Gilia inconspicua"
## [45] "Dryas octopetala" "Albizia procera"
## [47] "Lupinus benthamii" "Luzula echinata"
## [49] "Rhexia lutea" "Verbena simplex"
## [51] "Nandina domestica" "Chloris verticillata"
## [53] "Cyperus imbricatus" "Hydrangea quercifolia"
## [55] "Ribes montigenum" "Verbesina encelioides"
## [57] "Jatropha curcas" "Scutellaria saxatilis"
## [59] "Symphyotrichum falcatum" "Ipomoea cairica"
## [61] "Scirpus georgianus" "Uvularia puberula"
## [63] "Valeriana capitata" "Plantago rugelii"
## [65] "Oldenlandia corymbosa" "Erioneuron pilosum"
## [67] "Monarda fistulosa" "Ipomoea sp."
## [69] "Madia sp." "Rhynchospora glomerata"
## [71] "Stenocereus spp." "Myrsine sp."
## [73] "Amelanchier arborea" "Rudbeckia mollis"
## [75] "Asclepias nyctaginifolia" "Mitella sp."
## [77] "Parkinsonia microphylla" "Mimulus bicolor"
## [79] "Salix pseudomyrsinites" "Pinus clausa"
## [81] "Carex microrhyncha" "Herissantia crispa"
## [83] "Lechea spp." "Rubus deliciosus"
## [85] "Perityle stansburyi" "Hypericum anagalloides"
## [87] "Portulaca halimoides" "Echinochloa muricata"
## [89] "Euphorbia dentata" "Asclepias viridiflora"
## [91] "Chamaesyce revoluta" "Arthraxon hispidus"
## [93] "Dyschoriste sp." "Bowlesia incana"
## [95] "Asparagus officinalis" "Glandularia pumila"
## [97] "Fimbristylis spathacea" "Galactia pinetorum"
## [99] "Mahonia repens" "Rhabdadenia biflora"
Naming is hard: have clear and short function names.
Now, the first step is to think about a name for the function! What
should we name it? get_binomial_name
? Other options?
Tip: Verbs for function names
Tip: Be consistent with style: snake_case, camelCase, etc.
get_binomial_name <- function(){
}
What about arguments? What kind of arguments should the function have??
Tip: Nouns for arguments.
get_binomial_name <- function(v){
}
Exercise: finish the above function.
get_binomial_name <- function(v){
v2 = sub(pattern = "^([^ ]+ [^ ]*) .*", replacement = "\\1", x = data_plant$taxon_name)
return(v2)
}
What’s wrong with the above function??
How can we improve the function?
get_binomial_name <- function(v, patterns = "^([^ ]+ [^ ]*) .*"){
v2 = sub(pattern = patterns, replacement = "\\1", x = v)
return(v2)
}
get_binomial_name <- function(v,
patterns = "^([^ ]+ [^ ]*) .*",
repl = "\\1"
){
v2 = sub(pattern = patterns, replacement = repl, x = v)
return(v2)
} # benefit?? be more flexible?
Tips: Data arguments first; detail arguments later with default values.
head(get_binomial_name(data_bird$value), 10)
## [1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
head(data_bird$value, 10)
## [1] 1 1 1 1 1 1 1 1 1 1
class(data_bird$value)
## [1] "numeric"
Tips: Defensive coding and stop early; or return early.
If a function is going to fail, fail early!
get_binomial_name <- function(v,
patterns = "^([^ ]+ [^ ]*) .*",
repl = "\\1"
){
if(!is.character(v)){
stop("The input vector is not character.")
}
v2 = sub(pattern = patterns, replacement = repl, x = v)
return(v2)
}
get_binomial_name(data_bird$value)
# Error in get_binomial_name(data_bird$value) :
# The input vector is not character.
Sometimes, return early too!
f <- function(args){
if(x){
some complex calculation
out = results
} else {
some simple calculation
out = results
}
return(out)
}
# return early to save time
f <- function(args){
if(!x){
some simple calculation
out = results
return(out)
}
some complex calculation
results
}
Tips: Use comments to explain why.
get_binomial_name <- function(v,
patterns = "^([^ ]+ [^ ]*) .*",
repl = "\\1"
){
if(!is.character(v)){ # in case non-character input
stop("The input vector is not character.")
}
v2 = sub(pattern = patterns, replacement = repl, x = v)
return(v2)
}
Tips: ...
get_binomial_name <- function(v,
patterns = "^([^ ]+ [^ ]*) .*",
repl = "\\1",
...
){
if(!is.character(v)){ # in case non-character input
stop("The input vector is not character.")
}
v2 = sub(pattern = patterns, replacement = repl, x = v, ...)
return(v2)
}
head(get_binomial_name(data_bird$taxon_name, perl = TRUE), 50)
## [1] "Poecile atricapillus" "Vireo olivaceus" "Mniotilta varia"
## [4] "Setophaga virens" "Setophaga virens" "Troglodytes hiemalis"
## [7] "Mniotilta varia" "Troglodytes hiemalis" "Setophaga virens"
## [10] "Vireo olivaceus" "Setophaga ruticilla" "Poecile atricapillus"
## [13] "Catharus ustulatus" "Troglodytes hiemalis" "Sitta carolinensis"
## [16] "Empidonax minimus" "Vireo olivaceus" "Setophaga caerulescens"
## [19] "Setophaga magnolia" "Mniotilta varia" "Mniotilta varia"
## [22] "Setophaga pinus" "Troglodytes hiemalis" "Setophaga magnolia"
## [25] "Setophaga coronata" "Hylocichla mustelina" "Setophaga virens"
## [28] "Catharus fuscescens" "Setophaga caerulescens" "Poecile atricapillus"
## [31] "Setophaga pinus" "Setophaga magnolia" "Setophaga virens"
## [34] "Sphyrapicus varius" "Setophaga coronata" "Setophaga virens"
## [37] "Vireo olivaceus" "Regulus satrapa" "Mniotilta varia"
## [40] "Mniotilta varia" "Catharus guttatus" "Seiurus aurocapilla"
## [43] "Vireo olivaceus" "Melanerpes carolinus" "Setophaga coronata"
## [46] "Setophaga fusca" "Mniotilta varia" "Vireo olivaceus"
## [49] "Certhia americana" "Troglodytes hiemalis"
Tips: Document arguments! (use
{roxygen2}
)
#' To extract binomial scientific names
#'
#' Some description about the function.
#'
#' @param v Input vector, must be a character vector.
#' @param patterns The Regex pattern to match, default will extract the first two words.
#' @param repl The values to replace the pattern matched that was specified by `patterns`.
#' @param ... Additional arguments to be passed to `[sub]` function.
#' @return A character vector, with the binomial scientific names
#' @examples get_binomial_name(c("Carex aquatilis Wahlenb.", "Boerhavia coulteri (Hook. f.) S. Watson"))
#'
get_binomial_name <- function(v,
patterns = "^([^ ]+ [^ ]*) .*",
repl = "\\1",
...
){
if(!is.character(v)){ # in case non-character input
stop("The input vector is not character.")
}
v2 = sub(pattern = patterns, replacement = repl, x = v, ...)
return(v2)
}
sample(unique(get_binomial_name(data_fish$taxon_name)), 30)
## [1] "Percina spp." "Oncorhynchus mykiss"
## [3] "Cyprinella venusta" "Etheostoma swaini"
## [5] "Fundulus notatus" "Cottus carolinae"
## [7] "Notropis buchanani" "Notropis texanus"
## [9] "Lepomis cyanellus" "Gambusia sp."
## [11] "Lepomis macrochirus" "Etheostoma nigrum"
## [13] "Awaous banana" "Cottus cognatus"
## [15] "Etheostoma sp." "Cottus girardi"
## [17] "Cyprinella sp." "Hypentelium nigricans"
## [19] "Oncorhynchus clarki" "Micropterus spp."
## [21] "Esox sp." "Cyprinella spp."
## [23] "Pteronotropis hypselopterus" "Elassoma sp."
## [25] "Pylodictis olivaris" "Micropterus henshalli"
## [27] "Percina nigrofasciata" "Gambusia holbrooki"
## [29] "Notropis sp." "Fundulus olivaceus"
sample(unique(get_binomial_name(data_small_mammal$taxon_name)), 30)
## [1] "Onychomys leucogaster" "Reithrodontomys humulis"
## [3] "Perognathus fasciatus" "Zapus sp."
## [5] "Peromyscus attwateri" "Peromyscus sp."
## [7] "Peromyscus californicus" "Ochotona princeps"
## [9] "Perognathus sp." "Blarina brevicauda"
## [11] "Sigmodon sp." "Perognathus flavescens"
## [13] "Tamiasciurus douglasii" "Rattus norvegicus"
## [15] "Dipodomys spectabilis" "Peromyscus nasutus"
## [17] "Sylvilagus floridanus" "Microtus miurus"
## [19] "Microtus pennsylvanicus" "Zapus hudsonius"
## [21] "Microtus montanus" "Tamias quadrimaculatus"
## [23] "Dicrostonyx groenlandicus" "Spermophilus sp."
## [25] "Reithrodontomys fulvescens" "Tamias umbrinus"
## [27] "Sigmodon ochrognathus" "Ammospermophilus harrisii"
## [29] "Lemmiscus curtatus" "Onychomys torridus"
Note: Avoid overriding existing functions and variables.