One of the most powerful feature of R is its large packages (~20k). Each package contains some useful functions that allow users to use them to convert some input into an output. Today, we will learn how to write basic R functions.

Why?

The basic structure of a function

The basic structure of a function is:

my_function_name <- function(data, arg1 = value1, arg2 = value2, arg3, arg4, ...){
    <command to do things with the data, with args to control behaviors>
    return(results)
}

Example 1

fahrenheit_to_celsius <- function(temp_F) {
  temp_C <- (temp_F - 32) * 5 / 9
  return(temp_C) # return is optional
}
fahrenheit_to_celsius(68)
## [1] 20

Exercise: write a function to convert celsius to fahrenheit.

celsius_to_kelvin <- function(temp_C) {
  temp_K <- temp_C + 273.15
  return(temp_K)
}

# freezing point of water in Kelvin
celsius_to_kelvin(0)
## [1] 273.15

Exercise: write a function to convert fahrenheit to kelvin. Hint: use fahrenheit_to_celsius() and celsius_to_kelvin().

fahrenheit_to_kelvin <- function(temp_F) {
  temp_C <- ____________
  temp_K <- ____________
  temp_K
}

Tips: Write modularized functions!

Example 2

The environment of a function controls how R finds the value associated with a name.

f <- function(x) {
  x + y
}

R uses rules called lexical scoping to find the value associated with a name. Since y is not defined inside the function, R will look in the environment where the function was defined:

y <- 100
f(10)
## [1] 110
y <- 1000
f(10)
## [1] 1010

Example 3

Recall from home work 5 that we wrote some code to extract the binomial names from species names. The neonDivData has multiple data frames for different taxonomic groups. How can we make it as a function so that we can apply it to all other data frames?

Tips: Write working code first, then convert into a function

library(neonDivData)
x = sub(pattern = "^([^ ]+ [^ ]*) .*", replacement = "\\1", x = data_plant$taxon_name)
sample(unique(x), 100)
##   [1] "Erigeron vernus"              "Viola spp."                  
##   [3] "Eragrostis scaligera"         "Eriogonum abertianum"        
##   [5] "Mirabilis nyctaginea"         "Symphyotrichum novae-angliae"
##   [7] "Antennaria rosea"             "Opuntia dillenii"            
##   [9] "Quercus lyrata"               "Diodia virginiana"           
##  [11] "Quercus pungens"              "Anthemis cotula"             
##  [13] "Ribes spp."                   "Iris verna"                  
##  [15] "Carex laxiflora"              "Cosmos caudatus"             
##  [17] "Cirsium canescens"            "Saxifraga sp."               
##  [19] "Prosopis pubescens"           "Emilia fosbergii"            
##  [21] "Fuirena breviseta"            "Desmodium viridiflorum"      
##  [23] "Potentilla rupincola"         "Forestiera acuminata"        
##  [25] "Clarkia sp."                  "Viola sororia"               
##  [27] "Phlox stolonifera"            "Persea americana"            
##  [29] "Coreopsis sp."                "Setaria arizonica"           
##  [31] "Bidens frondosa"              "Senecio spartioides"         
##  [33] "Paspalum notatum"             "Leersia sp."                 
##  [35] "Sphaeralcea hastulata"        "Physaria spp."               
##  [37] "Cyperus esculentus"           "Kalmia sp."                  
##  [39] "Cathestecum erectum"          "Quercus minima"              
##  [41] "Asclepias verticillata"       "Andromeda polifolia"         
##  [43] "Cissus verticillata"          "Gilia inconspicua"           
##  [45] "Dryas octopetala"             "Albizia procera"             
##  [47] "Lupinus benthamii"            "Luzula echinata"             
##  [49] "Rhexia lutea"                 "Verbena simplex"             
##  [51] "Nandina domestica"            "Chloris verticillata"        
##  [53] "Cyperus imbricatus"           "Hydrangea quercifolia"       
##  [55] "Ribes montigenum"             "Verbesina encelioides"       
##  [57] "Jatropha curcas"              "Scutellaria saxatilis"       
##  [59] "Symphyotrichum falcatum"      "Ipomoea cairica"             
##  [61] "Scirpus georgianus"           "Uvularia puberula"           
##  [63] "Valeriana capitata"           "Plantago rugelii"            
##  [65] "Oldenlandia corymbosa"        "Erioneuron pilosum"          
##  [67] "Monarda fistulosa"            "Ipomoea sp."                 
##  [69] "Madia sp."                    "Rhynchospora glomerata"      
##  [71] "Stenocereus spp."             "Myrsine sp."                 
##  [73] "Amelanchier arborea"          "Rudbeckia mollis"            
##  [75] "Asclepias nyctaginifolia"     "Mitella sp."                 
##  [77] "Parkinsonia microphylla"      "Mimulus bicolor"             
##  [79] "Salix pseudomyrsinites"       "Pinus clausa"                
##  [81] "Carex microrhyncha"           "Herissantia crispa"          
##  [83] "Lechea spp."                  "Rubus deliciosus"            
##  [85] "Perityle stansburyi"          "Hypericum anagalloides"      
##  [87] "Portulaca halimoides"         "Echinochloa muricata"        
##  [89] "Euphorbia dentata"            "Asclepias viridiflora"       
##  [91] "Chamaesyce revoluta"          "Arthraxon hispidus"          
##  [93] "Dyschoriste sp."              "Bowlesia incana"             
##  [95] "Asparagus officinalis"        "Glandularia pumila"          
##  [97] "Fimbristylis spathacea"       "Galactia pinetorum"          
##  [99] "Mahonia repens"               "Rhabdadenia biflora"

Naming is hard: have clear and short function names.

Now, the first step is to think about a name for the function! What should we name it? get_binomial_name? Other options?

Tip: Verbs for function names

Tip: Be consistent with style: snake_case, camelCase, etc.

get_binomial_name <- function(){

}

What about arguments? What kind of arguments should the function have??

Tip: Nouns for arguments.

get_binomial_name <- function(v){

}

Exercise: finish the above function.

get_binomial_name <- function(v){
  v2 = sub(pattern = "^([^ ]+ [^ ]*) .*", replacement = "\\1", x = data_plant$taxon_name)
  return(v2)
}

What’s wrong with the above function??

How can we improve the function?

get_binomial_name <- function(v, patterns = "^([^ ]+ [^ ]*) .*"){
  v2 = sub(pattern = patterns, replacement = "\\1", x = v)
  return(v2)
}

get_binomial_name <- function(v, 
                              patterns = "^([^ ]+ [^ ]*) .*",
                              repl = "\\1"
                              ){
  v2 = sub(pattern = patterns, replacement = repl, x = v)
  return(v2)
} # benefit?? be more flexible?

Tips: Data arguments first; detail arguments later with default values.

head(get_binomial_name(data_bird$value), 10)
##  [1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
head(data_bird$value, 10)
##  [1] 1 1 1 1 1 1 1 1 1 1
class(data_bird$value)
## [1] "numeric"

Tips: Defensive coding and stop early; or return early.

If a function is going to fail, fail early!

get_binomial_name <- function(v, 
                              patterns = "^([^ ]+ [^ ]*) .*",
                              repl = "\\1"
                              ){
  if(!is.character(v)){
    stop("The input vector is not character.")
  }
  v2 = sub(pattern = patterns, replacement = repl, x = v)
  return(v2)
} 
get_binomial_name(data_bird$value)
# Error in get_binomial_name(data_bird$value) : 
#   The input vector is not character.

Sometimes, return early too!

f <- function(args){
    if(x){
      some complex calculation
      out = results
    } else {
      some simple calculation
      out = results
    }
    return(out)
}

# return early to save time
f <- function(args){
    if(!x){
    some simple calculation
    out = results
    return(out)
    }
    
    some complex calculation
    results
}

Tips: Use comments to explain why.

get_binomial_name <- function(v, 
                              patterns = "^([^ ]+ [^ ]*) .*",
                              repl = "\\1"
                              ){
  if(!is.character(v)){ # in case non-character input
    stop("The input vector is not character.")
  }
  v2 = sub(pattern = patterns, replacement = repl, x = v)
  return(v2)
} 

Tips: ...

get_binomial_name <- function(v, 
                              patterns = "^([^ ]+ [^ ]*) .*",
                              repl = "\\1",
                              ...
                              ){
  if(!is.character(v)){ # in case non-character input
    stop("The input vector is not character.")
  }
  v2 = sub(pattern = patterns, replacement = repl, x = v, ...)
  return(v2)
} 
head(get_binomial_name(data_bird$taxon_name, perl = TRUE), 50)
##  [1] "Poecile atricapillus"   "Vireo olivaceus"        "Mniotilta varia"       
##  [4] "Setophaga virens"       "Setophaga virens"       "Troglodytes hiemalis"  
##  [7] "Mniotilta varia"        "Troglodytes hiemalis"   "Setophaga virens"      
## [10] "Vireo olivaceus"        "Setophaga ruticilla"    "Poecile atricapillus"  
## [13] "Catharus ustulatus"     "Troglodytes hiemalis"   "Sitta carolinensis"    
## [16] "Empidonax minimus"      "Vireo olivaceus"        "Setophaga caerulescens"
## [19] "Setophaga magnolia"     "Mniotilta varia"        "Mniotilta varia"       
## [22] "Setophaga pinus"        "Troglodytes hiemalis"   "Setophaga magnolia"    
## [25] "Setophaga coronata"     "Hylocichla mustelina"   "Setophaga virens"      
## [28] "Catharus fuscescens"    "Setophaga caerulescens" "Poecile atricapillus"  
## [31] "Setophaga pinus"        "Setophaga magnolia"     "Setophaga virens"      
## [34] "Sphyrapicus varius"     "Setophaga coronata"     "Setophaga virens"      
## [37] "Vireo olivaceus"        "Regulus satrapa"        "Mniotilta varia"       
## [40] "Mniotilta varia"        "Catharus guttatus"      "Seiurus aurocapilla"   
## [43] "Vireo olivaceus"        "Melanerpes carolinus"   "Setophaga coronata"    
## [46] "Setophaga fusca"        "Mniotilta varia"        "Vireo olivaceus"       
## [49] "Certhia americana"      "Troglodytes hiemalis"

Tips: Document arguments! (use {roxygen2})

#' To extract binomial scientific names
#' 
#' Some description about the function.
#'
#' @param v Input vector, must be a character vector.
#' @param patterns The Regex pattern to match, default will extract the first two words.
#' @param repl The values to replace the pattern matched that was specified by `patterns`.
#' @param ... Additional arguments to be passed to `[sub]` function.
#' @return A character vector, with the binomial scientific names
#' @examples get_binomial_name(c("Carex aquatilis Wahlenb.", "Boerhavia coulteri (Hook. f.) S. Watson"))
#'
get_binomial_name <- function(v, 
                              patterns = "^([^ ]+ [^ ]*) .*",
                              repl = "\\1",
                              ...
){
  if(!is.character(v)){ # in case non-character input
    stop("The input vector is not character.")
  }
  v2 = sub(pattern = patterns, replacement = repl, x = v, ...)
  return(v2)
} 
sample(unique(get_binomial_name(data_fish$taxon_name)), 30)
##  [1] "Percina spp."                "Oncorhynchus mykiss"        
##  [3] "Cyprinella venusta"          "Etheostoma swaini"          
##  [5] "Fundulus notatus"            "Cottus carolinae"           
##  [7] "Notropis buchanani"          "Notropis texanus"           
##  [9] "Lepomis cyanellus"           "Gambusia sp."               
## [11] "Lepomis macrochirus"         "Etheostoma nigrum"          
## [13] "Awaous banana"               "Cottus cognatus"            
## [15] "Etheostoma sp."              "Cottus girardi"             
## [17] "Cyprinella sp."              "Hypentelium nigricans"      
## [19] "Oncorhynchus clarki"         "Micropterus spp."           
## [21] "Esox sp."                    "Cyprinella spp."            
## [23] "Pteronotropis hypselopterus" "Elassoma sp."               
## [25] "Pylodictis olivaris"         "Micropterus henshalli"      
## [27] "Percina nigrofasciata"       "Gambusia holbrooki"         
## [29] "Notropis sp."                "Fundulus olivaceus"
sample(unique(get_binomial_name(data_small_mammal$taxon_name)), 30)
##  [1] "Onychomys leucogaster"      "Reithrodontomys humulis"   
##  [3] "Perognathus fasciatus"      "Zapus sp."                 
##  [5] "Peromyscus attwateri"       "Peromyscus sp."            
##  [7] "Peromyscus californicus"    "Ochotona princeps"         
##  [9] "Perognathus sp."            "Blarina brevicauda"        
## [11] "Sigmodon sp."               "Perognathus flavescens"    
## [13] "Tamiasciurus douglasii"     "Rattus norvegicus"         
## [15] "Dipodomys spectabilis"      "Peromyscus nasutus"        
## [17] "Sylvilagus floridanus"      "Microtus miurus"           
## [19] "Microtus pennsylvanicus"    "Zapus hudsonius"           
## [21] "Microtus montanus"          "Tamias quadrimaculatus"    
## [23] "Dicrostonyx groenlandicus"  "Spermophilus sp."          
## [25] "Reithrodontomys fulvescens" "Tamias umbrinus"           
## [27] "Sigmodon ochrognathus"      "Ammospermophilus harrisii" 
## [29] "Lemmiscus curtatus"         "Onychomys torridus"

Note: Avoid overriding existing functions and variables.