In today’s lecture, we will continue to talk about R basics. Specifically, we will cover missing values, factors, date times, and subsetting.

`NA`

In R, missing values are represented with `NA`

(not
applicable). One main feature of `NA`

is that it is
infectious: **most computations involving a missing value will
return another missing value.**

`## [1] NA`

`## [1] NA`

**Exceptions exist when some identity holds for all possible
inputs**

`## [1] 1`

`## [1] TRUE`

`## [1] FALSE`

Because of these propagation, we need to be very careful when dealing
with values including `NA`

. For example

`## [1] NA`

`## [1] 2`

`## [1] NA NA NA NA`

`## [1] FALSE FALSE FALSE TRUE`

Four types of missing values: `NA`

(logical),
`NA_integer_`

(integer), `NA_real_`

(double), and
`NA_character_`

(character). But in most cases, we don’t need
to worry about it because `NA`

will be automatically coerced
to the correct type.

An S3 object is a base type with at least a `class`

attribute. A generic function can do different things to different S3
objects. An example is the `str()`

function, it returns
different outputs when the inputs are different (e.g., a vector vs. a
data.frame)

A factor is a vector that can contain only predefined values. This is
normalyl used to store categorical data (e.g., spring, summer, fall,
winter). A factor is an integer vector with two attributes: a
`class`

(factor) that makes it behave differently from an
integer vector, and a `levels`

that defines allowed
values.

```
## [1] spring summer fall winter
## Levels: fall spring summer winter
```

`## [1] "integer"`

```
## $levels
## [1] "fall" "spring" "summer" "winter"
##
## $class
## [1] "factor"
```

```
## [1] spring fall
## Levels: spring summer fall winter
```

```
## v_f2
## spring summer fall winter
## 1 0 1 0
```

```
## [1] spring winter fall
## Levels: spring < summer < fall < winter
```

```
## Warning in `[<-.factor`(`*tmp*`, 4, value = "weather"): invalid factor level, NA
## generated
```

```
## [1] spring winter fall <NA>
## Levels: spring < summer < fall < winter
```

**Note: factors are build on integer vectors, even though they
look like characters**. So it is usually best to convert factors
to character vectors if you need to deal with strings.

`## [1] 2 3 1 4`

`## [1] 1 4 3 NA`

`## [1] "spring" "summer" "fall" "winter"`

Date vectors are built on top of double vectors, with a
`Date`

class.

`## [1] "double"`

```
## $class
## [1] "Date"
```

The value represents the number of days since 1970-01-01.

`## [1] "1970-01-02"`

`## [1] 1`

Date-time information is saved in base R as two ways: POSIXct and POSIXlt. “POSIX” is short for Portable Operating System Interface, which is a family of cross-platform standards. “ct” stands for calendar time, and “lt” for local time.

POSIXct vectors are built on top of double vectors, where the value represents the number of seconds since 1970-01-01.

`## [1] "2021-09-28 09:20:00 CDT"`

`## [1] "double"`

```
## $class
## [1] "POSIXct" "POSIXt"
##
## $tzone
## [1] "US/Central"
```

```
## [1] 1632838800
## attr(,"tzone")
## [1] "US/Central"
```

`## [1] "2021-09-28 22:20:00 CST"`

**The R package
lubridate can make dealing with most date-time data
easy.** Make sure to install it and check it out.

Till now, we have covered most data structures (`str()`

)
in R. Now we are moving on to learn how to access specified elements of
each common data structures.

- Six ways to subset atomic vectors
- Three subsetting operators:
`[`

,`[[`

, and`$`

; they interact differently with different vector types - Subsetting can be combined with assignment, making it powerful to edit data

We can use `[`

to select any number of elements from a
vector. There are six ways to do so.

`## [1] 1 5`

`## [1] 1 1`

`## [1] 2 3`

`## [1] 2 4 5`

`## [1] 2 4 5`

`TRUE`

.`## [1] 1 3 5`

`## [1] 1 3 5`

`## [1] 1 3 NA 5`

`## [1] 1 NA 3 NA 5`

`## [1] 1 3 5`

Use `[`

to select elements of a list will always return
results as a list. `[[`

and `$`

will extract the
elements of a list. `[[`

is used for extracting single items,
while `x$y`

is a useful shorthand for
`x[["y"]]`

.

```
## [[1]]
## [1] 1 2 3 4 5
##
## [[2]]
## [1] TRUE FALSE
```

```
## [[1]]
## [1] "a" "b" "c"
```

`## [1] "a" "b" "c"`

`## [1] "c"`

When a list is named, we can use `$`

.

```
## $A
## [1] 1 2 3 4 5
##
## $B
## [1] "a" "b" "c"
##
## $C
## [1] TRUE FALSE
```

`## [1] "a" "b" "c"`

`## [1] "a" "b" "c"`

If list

`x`

is a train carrying objects, then`x[[5]]`

is the object in car 5;`x[4:6]`

is a train of cars 4-6. — @RLangTip, https://twitter.com/RLangTip/status/268375867468681216

You can subset higher-dimensional structures in three ways:

- With multiple vectors.
- With a single vector.
- With a matrix.

The most common way of subsetting matrices (2D) and arrays (>2D)
is a simple generalisation of 1D subsetting: **supply a 1D index
for each dimension, separated by a comma**.

Blank subsetting is now useful because it lets you keep all rows or all columns.

```
a <- matrix(1:9, nrow = 3)
colnames(a) <- c("col_1", "col_2", "col_3")
rownames(a) <- c("row_1", "row_2", "row_3")
a
```

```
## col_1 col_2 col_3
## row_1 1 4 7
## row_2 2 5 8
## row_3 3 6 9
```

```
## col_1 col_2 col_3
## row_1 1 4 7
## row_2 2 5 8
```

```
## col_3 col_1
## row_1 7 1
## row_3 9 3
```

`## col_1 col_3`

By default, `[`

will simplify results to the lowest
possible dimensionality. For example, if we only select a row or a
column of a matrix, then the results will be a vector instead of a
matrix.

```
## col_1 col_2 col_3
## 1 4 7
```

`## [1] "integer"`

`## [1] "matrix" "array"`

```
## col_1 col_2 col_3
## row_1 1 4 7
```

`## [1] "matrix" "array"`

`## [1] 5`

```
## col_2
## row_2 5
```

Because both matrices and arrays are just vectors with special
attributes, you can **subset them with a single vector**,
as if they were a 1D vector.

```
## col_1 col_2 col_3
## row_1 1 4 7
## row_2 2 5 8
## row_3 3 6 9
```

`## [1] 1 5 9`

`## [1] 2 4 6 8`

```
## col_1 col_2 col_3
## row_1 1 -1 7
## row_2 -1 5 -1
## row_3 3 -1 9
```

It is also possible to **subset with a matrix**. Each
row in the matrix specifies the location of one value, with each column
corresponds to a dimension in the matrix or array.

```
## [,1] [,2]
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
```

`## [1] 1 5 9`

```
## col_1 col_2 col_3
## row_1 1 4 7
## row_2 2 5 8
## row_3 3 6 9
```

```
## [,1] [,2] [,3]
## [1,] FALSE TRUE TRUE
## [2,] FALSE FALSE TRUE
## [3,] FALSE FALSE FALSE
```

`## [1] 4 7 8`

Recall that data frame has properties of both list and matrix. So when subset a data frame with one vector, it will act as a list and return the elements (columns) of the list.

```
## x y z
## 1 1 -2.1087062 a
## 2 2 0.6364000 b
## 3 3 0.3141892 c
## 4 4 0.5340274 d
## 5 5 -0.5669961 e
```

```
## x z
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
```

```
## x z
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
```

When subset with two indices, it will act as a matrix.

```
## y z
## 1 -2.108706 a
## 2 0.636400 b
```

```
## x y z
## 1 1 -2.1087062 a
## 2 2 0.6364000 b
## 3 3 0.3141892 c
```

```
## x y
## 1 1 -2.1087062
## 2 2 0.6364000
## 3 3 0.3141892
## 4 4 0.5340274
## 5 5 -0.5669961
```

```
## x y z
## 3 3 0.3141892 c
```

```
## x y z
## 2 2 0.6364000 b
## 4 4 0.5340274 d
```

```
## x y z
## 2 2 0.6364000 b
## 3 3 0.3141892 c
## 4 4 0.5340274 d
```

```
## x y z
## 3 3 0.3141892 c
```

```
## 'data.frame': 1 obs. of 3 variables:
## $ x: int 3
## $ y: num 0.314
## $ z: chr "c"
```

`## [1] "c"`

`## chr "c"`

```
## z
## 3 c
```

```
## 'data.frame': 1 obs. of 1 variable:
## $ z: chr "c"
```

The default `drop = TRUE`

behaviour is a common source of
bugs in functions; try to use `drop = FALSE`

when subsetting
a 2-D object. Or try the `tibble`

package, which uses
`tibble`

class to represent data frame and will always use
`drop = FALSE`

.

`@`

and `slot()`

For S4 objects, we can use two operators to subset: `@`

(equivalent to `$`

) and `slot()`

(equivalent to
`[[`

). We won’t talk more about it in this course.

Most of this lecture’s materials is from Advanced R